User`s guide

Run CUDA or PTX Code on GPU

9-27

Use gpuArray Variables

It might be more efficient to use gpuArray objects as input when running a kernel:

k = parallel.gpu.CUDAKernel('conv.ptx','conv.cu');

i1 = gpuArray(rand(100,1,'single'));

i2 = gpuArray(rand(100,1,'single'));

result1 = feval(k,i1,i2);

Because the output is a gpuArray, you can now perform other operations using this

input or output data without further transfers between the MATLAB workspace and the

GPU. When all your GPU computations are complete, gather your final result data into

the MATLAB workspace:

result2 = feval(k,i1,i2);

r1 = gather(result1);

r2 = gather(result2);

Determine Input and Output Correspondence

When calling [out1, out2] = feval(kernel, in1, in2, in3), the inputs in1,

in2, and in3 correspond to each of the input arguments to the C function within your

CU file. The outputs out1 and out2 store the values of the first and second non-const

pointer input arguments to the C function after the C kernel has been executed.

For example, if the C kernel within a CU file has the following signature:

void reallySimple( float * pInOut, float c )

the corresponding kernel object (k) in MATLAB has the following properties:

MaxNumLHSArguments: 1

NumRHSArguments: 2

ArgumentTypes: {'inout single vector' 'in single scalar'}

Therefore, to use the kernel object from this code with feval, you need to provide feval

two input arguments (in addition to the kernel object), and you can use one output

argument:

y = feval(k,x1,x2)