User`s guide

Run CUDA or PTX Code on GPU

9-29

Compile the CU code at the shell command line to generate a PTX file called

test.ptx.

nvcc -ptx test.cu

Create the kernel in MATLAB. Currently this PTX file only has one entry so you do

not need to specify it. If you were to put more kernels in, you would specify add1 as

the entry.

k = parallel.gpu.CUDAKernel('test.ptx','test.cu');

Run the kernel with two numeric inputs. By default, a kernel runs on one thread.

result = feval(k,2,3)

result =

Add Two Vectors

This example extends the previous one to add two vectors together. For simplicity,

assume that there are exactly the same number of threads as elements in the vectors and

that there is only one thread block.

The CU code is slightly different from the last example. Both inputs are pointers,

and one is constant because you are not changing it. Each thread will simply add the

elements at its thread index. The thread index must work out which element this

thread should add. (Getting these thread- and block-specific values is a very common

pattern in CUDA programming.)

__global__ void add2( double * v1, const double * v2 )

{

int idx = threadIdx.x;

v1[idx] += v2[idx];

}

Save this code in the file test.cu.

Compile as before using nvcc.

nvcc -ptx test.cu

If this code was put in the same CU file along with the code of the first example, you

need to specify the entry point name this time to distinguish it.

k = parallel.gpu.CUDAKernel('test.ptx','test.cu','add2');