User`s guide

Measure and Improve GPU Performance
9-37
on the GPU, rewrites the code to use arrayfun for element-wise operations, and finally
shows how to integrate a custom CUDA kernel for the same operation.
Alternately, you can write a CUDA kernel as part of a MEX-file and call it using the
CUDA Runtime API inside the MEX-file. Either of these approaches might let you work
with low-level features of the GPU, such as shared memory and texture memory, that
are not directly available in MATLAB code. For more details, see the example Accessing
Advanced CUDA Features Using MEX.
Best Practices for Improving Performance
Hardware Configuration
In general you can achieve the best performance when your GPU is dedicated to
computing. It is usually not practical to use the same GPU device for both computations
and graphics, because of the amount of memory taken up for problems of reasonable
size and the constant use of the device by the system for graphics. If possible, obtain a
separate device for graphics. Details of configuring your device for compute or graphics
depend on the operating system and driver version.
On Windows systems, a GPU device can be in one of two modes: Windows Display Driver
Model (WDDM) or Tesla Compute Cluster (TCC) mode. For best performance, any
devices used for computing should be in TCC mode. Consult NVIDIA documentation for
more details.
NVIDIA’s highest-performance compute devices, the Tesla line, support error correcting
codes (ECC) when reading and writing GPU memory. The purpose of ECC is to correct
for occasional bit-errors that occur normally when reading or writing dynamic memory.
One technique to improve performance is to turn off ECC to increase the achievable
memory bandwidth. While the hardware can be configured this way, MathWorks does
not recommend this practice. The potential loss of accuracy due to silent errors can be
more harmful than the performance benefit.
MATLAB Coding Practices
This topic describes general techniques that help you achieve better performance on the
GPU. Some of these tips apply when writing MATLAB code for the CPU as well.
Data in MATLAB arrays is stored in column-major order. Therefore, it is beneficial
to operate along the first or column dimension of your array. If one dimension of
your data is significantly longer than others, you might achieve better performance