User`s guide

Run CUDA or PTX Code on GPU
9-27
Use gpuArray Variables
It might be more efficient to use gpuArray objects as input when running a kernel:
k = parallel.gpu.CUDAKernel('conv.ptx','conv.cu');
i1 = gpuArray(rand(100,1,'single'));
i2 = gpuArray(rand(100,1,'single'));
result1 = feval(k,i1,i2);
Because the output is a gpuArray, you can now perform other operations using this
input or output data without further transfers between the MATLAB workspace and the
GPU. When all your GPU computations are complete, gather your final result data into
the MATLAB workspace:
result2 = feval(k,i1,i2);
r1 = gather(result1);
r2 = gather(result2);
Determine Input and Output Correspondence
When calling [out1, out2] = feval(kernel, in1, in2, in3), the inputs in1,
in2, and in3 correspond to each of the input arguments to the C function within your
CU file. The outputs out1 and out2 store the values of the first and second non-const
pointer input arguments to the C function after the C kernel has been executed.
For example, if the C kernel within a CU file has the following signature:
void reallySimple( float * pInOut, float c )
the corresponding kernel object (k) in MATLAB has the following properties:
MaxNumLHSArguments: 1
NumRHSArguments: 2
ArgumentTypes: {'inout single vector' 'in single scalar'}
Therefore, to use the kernel object from this code with feval, you need to provide feval
two input arguments (in addition to the kernel object), and you can use one output
argument:
y = feval(k,x1,x2)