User`s guide
9 GPU Computing
9-38
if you make that the first dimension. Similarly, if you frequently operate along a
particular dimension, it is usually best to have it as the first dimension. In some cases, if
consecutive operations target different dimensions of an array, it might be beneficial to
transpose or permute the array between these operations.
GPUs achieve high performance by calculating many results in parallel. Thus, matrix
and higher-dimensional array operations typically perform much better than operations
on vectors or scalars. You can achieve better performance by rewriting your loops to
make use of higher-dimensional operations. The process of revising loop-based, scalar-
oriented code to use MATLAB matrix and vector operations is called vectorization. For
more details, see “Using Vectorization”.
By default, all operations in MATLAB are performed in double-precision floating-point
arithmetic. However, most operations support a variety of data types, including integer
and single-precision floating-point. Today’s GPUs and CPUs typically have much higher
throughput when performing single-precision operations, and single-precision floating-
point data occupies less memory. If your application’s accuracy requirements allow the
use of single-precision floating-point, it can greatly improve the performance of your
MATLAB code.
The GPU sits at the end of a data transfer mechanism known as the PCI bus. While this
bus is an efficient, high-bandwidth way to transfer data from the PC host memory to
various extension cards, it is still much slower than the overall bandwidth to the global
memory of the GPU device or of the CPU (for more details, see the example Measuring
GPU Performance). In addition, transfers from the GPU device to MATLAB host memory
cause MATLAB to wait for all pending operations on the device to complete before
executing any other statements. This can significantly hurt the performance of your
application. In general, you should limit the number of times you transfer data between
the MATLAB workspace and the GPU. If you can transfer data to the GPU once at the
start of your application, perform all the calculations you can on the GPU, and then
transfer the results back into MATLAB at the end, that generally results in the best
performance. Similarly, when possible it helps to create arrays directly on the GPU,
using either the 'gpuArray' or the 'like' option for functions such as zeros (e.g., Z =
zeros(___,'gpuArray') or Z = zeros(N,'like',g) for existing gpuArray g).
Measure Performance on the GPU
The best way to measure performance on the GPU is to use gputimeit. This function
takes as input a function handle with no input arguments, and returns the measured
execution time of that function. It takes care of such benchmarking considerations as