User`s guide
Run CUDA or PTX Code on GPU
9-25
__global__ void simplestKernelEver( float * x, float val )
then the PTX code contains an entry that might be called
_Z18simplestKernelEverPff.
When you have multiple entry points, specify the entry name for the particular kernel
when calling CUDAKernel to generate your kernel.
Note The CUDAKernel function searches for your entry name in the PTX file, and
matches on any substring occurrences. Therefore, you should not name any of your
entries as substrings of any others.
You might not have control over the original entry names, in which case you must be
aware of the unique mangled derived for each. For example, consider the following
function template.
template <typename T>
__global__ void add4( T * v1, const T * v2 )
{
int idx = threadIdx.x;
v1[idx] += v2[idx];
}
When the template is expanded out for float and double, it results in two entry points,
both of which contain the substring add4.
template __global__ void add4<float>(float *, const float *);
template __global__ void add4<double>(double *, const double *);
The PTX has corresponding entries:
_Z4add4IfEvPT_PKS0_
_Z4add4IdEvPT_PKS0_
Use entry point add4If for the float version, and add4Id for the double version.
k = parallel.gpu.CUDAKernel('test.ptx','double *, const double *','add4Id');
Specify Number of Threads
You specify the number of computational threads for your CUDAKernel by setting two of
its object properties: