User`s guide

9 GPU Computing

9-26

• GridSize — A vector of three elements, the product of which determines the number

of blocks.

• ThreadBlockSize — A vector of three elements, the product of which determines

the number of threads per block. (Note that the product cannot exceed the value of the

property MaxThreadsPerBlock.)

The default value for both of these properties is [1 1 1], but suppose you want to

use 500 threads to run element-wise operations on vectors of 500 elements in parallel.

A simple way to achieve this is to create your CUDAKernel and set its properties

accordingly:

k = parallel.gpu.CUDAKernel('myfun.ptx','myfun.cu');

k.ThreadBlockSize = [500,1,1];

Generally, you set the grid and thread block sizes based on the sizes of your inputs.

For information on thread hierarchy, and multiple-dimension grids and blocks, see the

NVIDIA CUDA C Programming Guide.

Run a CUDAKernel

• “Use Workspace Variables” on page 9-26

• “Use gpuArray Variables” on page 9-27

• “Determine Input and Output Correspondence” on page 9-27

Use the feval function to evaluate a CUDAKernel on the GPU. The following examples

show how to execute a kernel using MATLAB workspace variables and gpuArray

variables.

Use Workspace Variables

Assume that you have already written some kernels in a native language and want to

use them in MATLAB to execute on the GPU. You have a kernel that does a convolution

on two vectors; load and run it with two random input vectors:

k = parallel.gpu.CUDAKernel('conv.ptx','conv.cu');

result = feval(k,rand(100,1),rand(100,1));

Even if the inputs are constants or variables for MATLAB workspace data, the output is

gpuArray.