User`s guide

9 GPU Computing
9-36
you might need to vectorize your code, replacing looped scalar operations with MATLAB
matrix and vector operations. While vectorizing is generally a good practice on the CPU,
it is usually critical for achieving high performance on the GPU. For more information,
see “Vectorize for Improved GPU Performance” on page 9-39.
Advanced Tools for Improving Performance
It is possible that even after converting inputs to gpuArrays and vectorizing your code,
there are operations in your algorithm that are either not built-in functions, or that are
not fast enough to meet your application’s requirements. In such situations you have
three main options: use arrayfun to precompile element-wise parts of your application,
make use of GPU library functions, or write a custom CUDA kernel.
If you have a purely element-wise function, you can improve its performance by calling
it with arrayfun. The arrayfun function on the GPU turns an element-wise MATLAB
function into a custom CUDA kernel, thus reducing the overhead of performing the
operation. Often, there is a subset of your application that can be used with arrayfun
even if the entire application cannot be. The example Improve Performance of Element-
wise MATLAB Functions on the GPU using ARRAYFUN shows the basic concepts of this
approach; and the example Using ARRAYFUN for Monte-Carlo Simulations shows how
this can be done in simulations for a finance application.
MATLAB provides an extensive library of GPU-enabled functions in Parallel Computing
Toolbox, Image Processing Toolbox™, Signal Processing Toolbox™, and other products.
However, there are many libraries of additional functions that do not have direct built-
in analogs in MATLAB’s GPU support. Examples include the NVIDIA Performance
Primitives library and the CURAND library, which are included in the CUDA toolkit
that ships with MATLAB. If you need to call a function in one of these libraries, you can
do so using the GPU MEX interface. This interface allows you to extract the pointers to
the device data from MATLAB gpuArrays so that you can pass these pointers to GPU
functions. You can convert the returned values into gpuArrays for return to MATLAB.
For more information see “Run MEX-Functions Containing CUDA Code” on page 9-31.
Finally, you have the option of writing a custom CUDA kernel for the operation that
you need. Such kernels can be directly integrated into MATLAB using the CUDAKernel
object.
The example Illustrating Three Approaches to GPU Computing: The Mandelbrot Set
shows how to implement a simple calculation using three of the approaches mentioned
in this section. This example begins with MATLAB code that is easily converted to run