User`s guide

9 GPU Computing

9-36

you might need to vectorize your code, replacing looped scalar operations with MATLAB

matrix and vector operations. While vectorizing is generally a good practice on the CPU,

it is usually critical for achieving high performance on the GPU. For more information,

see “Vectorize for Improved GPU Performance” on page 9-39.

Advanced Tools for Improving Performance

It is possible that even after converting inputs to gpuArrays and vectorizing your code,

there are operations in your algorithm that are either not built-in functions, or that are

not fast enough to meet your application’s requirements. In such situations you have

three main options: use arrayfun to precompile element-wise parts of your application,

make use of GPU library functions, or write a custom CUDA kernel.

If you have a purely element-wise function, you can improve its performance by calling

it with arrayfun. The arrayfun function on the GPU turns an element-wise MATLAB

function into a custom CUDA kernel, thus reducing the overhead of performing the

operation. Often, there is a subset of your application that can be used with arrayfun

even if the entire application cannot be. The example Improve Performance of Element-

wise MATLAB Functions on the GPU using ARRAYFUN shows the basic concepts of this

approach; and the example Using ARRAYFUN for Monte-Carlo Simulations shows how

this can be done in simulations for a finance application.

MATLAB provides an extensive library of GPU-enabled functions in Parallel Computing

Toolbox, Image Processing Toolbox™, Signal Processing Toolbox™, and other products.

However, there are many libraries of additional functions that do not have direct built-

in analogs in MATLAB’s GPU support. Examples include the NVIDIA Performance

Primitives library and the CURAND library, which are included in the CUDA toolkit

that ships with MATLAB. If you need to call a function in one of these libraries, you can

do so using the GPU MEX interface. This interface allows you to extract the pointers to

the device data from MATLAB gpuArrays so that you can pass these pointers to GPU

functions. You can convert the returned values into gpuArrays for return to MATLAB.

For more information see “Run MEX-Functions Containing CUDA Code” on page 9-31.

Finally, you have the option of writing a custom CUDA kernel for the operation that

you need. Such kernels can be directly integrated into MATLAB using the CUDAKernel

object.

The example Illustrating Three Approaches to GPU Computing: The Mandelbrot Set

shows how to implement a simple calculation using three of the approaches mentioned

in this section. This example begins with MATLAB code that is easily converted to run