User`s guide

Run CUDA or PTX Code on GPU

9-25

__global__ void simplestKernelEver( float * x, float val )

then the PTX code contains an entry that might be called

_Z18simplestKernelEverPff.

When you have multiple entry points, specify the entry name for the particular kernel

when calling CUDAKernel to generate your kernel.

Note The CUDAKernel function searches for your entry name in the PTX file, and

matches on any substring occurrences. Therefore, you should not name any of your

entries as substrings of any others.

You might not have control over the original entry names, in which case you must be

aware of the unique mangled derived for each. For example, consider the following

function template.

template <typename T>

__global__ void add4( T * v1, const T * v2 )

{

int idx = threadIdx.x;

v1[idx] += v2[idx];

}

When the template is expanded out for float and double, it results in two entry points,

both of which contain the substring add4.

template __global__ void add4<float>(float *, const float *);

template __global__ void add4<double>(double *, const double *);

The PTX has corresponding entries:

_Z4add4IfEvPT_PKS0_

_Z4add4IdEvPT_PKS0_

Use entry point add4If for the float version, and add4Id for the double version.

k = parallel.gpu.CUDAKernel('test.ptx','double *, const double *','add4Id');

Specify Number of Threads

You specify the number of computational threads for your CUDAKernel by setting two of

its object properties: