User`s guide

9 GPU Computing
9-26
GridSize — A vector of three elements, the product of which determines the number
of blocks.
ThreadBlockSize — A vector of three elements, the product of which determines
the number of threads per block. (Note that the product cannot exceed the value of the
property MaxThreadsPerBlock.)
The default value for both of these properties is [1 1 1], but suppose you want to
use 500 threads to run element-wise operations on vectors of 500 elements in parallel.
A simple way to achieve this is to create your CUDAKernel and set its properties
accordingly:
k = parallel.gpu.CUDAKernel('myfun.ptx','myfun.cu');
k.ThreadBlockSize = [500,1,1];
Generally, you set the grid and thread block sizes based on the sizes of your inputs.
For information on thread hierarchy, and multiple-dimension grids and blocks, see the
NVIDIA CUDA C Programming Guide.
Run a CUDAKernel
“Use Workspace Variables” on page 9-26
“Use gpuArray Variables” on page 9-27
“Determine Input and Output Correspondence” on page 9-27
Use the feval function to evaluate a CUDAKernel on the GPU. The following examples
show how to execute a kernel using MATLAB workspace variables and gpuArray
variables.
Use Workspace Variables
Assume that you have already written some kernels in a native language and want to
use them in MATLAB to execute on the GPU. You have a kernel that does a convolution
on two vectors; load and run it with two random input vectors:
k = parallel.gpu.CUDAKernel('conv.ptx','conv.cu');
result = feval(k,rand(100,1),rand(100,1));
Even if the inputs are constants or variables for MATLAB workspace data, the output is
gpuArray.