User`s guide
Measure and Improve GPU Performance
9-35
Measure and Improve GPU Performance
In this section...
“Basic Workflow for Improving Performance” on page 9-35
“Advanced Tools for Improving Performance” on page 9-36
“Best Practices for Improving Performance” on page 9-37
“Measure Performance on the GPU” on page 9-38
“Vectorize for Improved GPU Performance” on page 9-39
Basic Workflow for Improving Performance
The purpose of GPU computing in MATLAB is to speed up your applications. This
topic discusses fundamental concepts and practices that can help you achieve better
performance on the GPU, such as the configuration of the GPU hardware and best
practices within your code. It discusses the trade-off between implementation difficulty
and performance, and describes the criteria you might use to choose between using
gpuArray functions, arrayfun, MEX-files, or CUDA kernels. Finally, it describes how to
accurately measure performance on the GPU.
When converting MATLAB code to run on the GPU, it is best to start with MATLAB
code that already performs well. While the GPU and CPU have different performance
characteristics, the general guidelines for writing good MATLAB code also help you write
good MATLAB code for the GPU. The first step is almost always to profile your CPU
code. The lines of code that the profiler shows taking the most time on the CPU will
likely be ones that you must concentrate on when you code for the GPU.
It is easiest to start converting your code using MATLAB built-in functions that support
gpuArray data. These functions take gpuArray inputs, perform calculations on the
GPU, and return gpuArray outputs. A list of the MATLAB functions that support
gpuArray data is found in “Run Built-In Functions on a GPU” on page 9-8. In general
these functions support the same arguments and data types as standard MATLAB
functions that are calculated in the CPU. Any limitations in these overloaded functions
for gpuArrays are described in their command-line help (e.g., help gpuArray/qr).
If all the functions that you want to use are supported on the GPU, running code on
the GPU may be as simple as calling gpuArray to transfer input data to the GPU, and
calling gather to retrieve the output data from the GPU when finished. In many cases,