User`s guide
Run CUDA or PTX Code on GPU
9-29
2
Compile the CU code at the shell command line to generate a PTX file called
test.ptx.
nvcc -ptx test.cu
3
Create the kernel in MATLAB. Currently this PTX file only has one entry so you do
not need to specify it. If you were to put more kernels in, you would specify add1 as
the entry.
k = parallel.gpu.CUDAKernel('test.ptx','test.cu');
4
Run the kernel with two numeric inputs. By default, a kernel runs on one thread.
result = feval(k,2,3)
result =
5
Add Two Vectors
This example extends the previous one to add two vectors together. For simplicity,
assume that there are exactly the same number of threads as elements in the vectors and
that there is only one thread block.
1
The CU code is slightly different from the last example. Both inputs are pointers,
and one is constant because you are not changing it. Each thread will simply add the
elements at its thread index. The thread index must work out which element this
thread should add. (Getting these thread- and block-specific values is a very common
pattern in CUDA programming.)
__global__ void add2( double * v1, const double * v2 )
{
int idx = threadIdx.x;
v1[idx] += v2[idx];
}
Save this code in the file test.cu.
2
Compile as before using nvcc.
nvcc -ptx test.cu
3
If this code was put in the same CU file along with the code of the first example, you
need to specify the entry point name this time to distinguish it.
k = parallel.gpu.CUDAKernel('test.ptx','test.cu','add2');