User`s guide

Measure and Improve GPU Performance
9-41
On the same machine, this code displays the output:
Execution time on CPU = 0.019335
Execution time on GPU = 0.027235
Maximum absolute error = 1.1374e-14
Unfortunately, the GPU is slower than the CPU for this problem. The reason is that the
for-loop is executing the FFT, multiplication, and inverse FFT operations on individual
columns of length 4096. The best way to increase the performance is to vectorize the
code, so that a single MATLAB function call performs more calculation. The FFT and
IFFT operations are easy to vectorize: fft(A) computes the FFT of each column of a
matrix A. You can perform a multiply of the filter with every column in a matrix at once
using the MATLAB binary scalar expansion function bsxfun. The vectorized function
looks like this:
function y = fastConvolution_v2(data,filter)
m = size(data,1);
% Zero-pad filter to the length of data, and transform
filter_f = fft(filter,m);
% Transform each column of the input
af = fft(data);
% Multiply each column by filter and compute inverse transform
y = ifft(bsxfun(@times,af,filter_f));
end
Perform the same experiment using the vectorized function:
a = complex(randn(4096,100),randn(4096,100)); % Data input
b = randn(16,1); % Filter input
c = fastConvolution_v2(a,b); % Calculate output
ctime = timeit(@()fastConvolution_v2(a,b)); % Measure CPU time
disp(['Execution time on CPU = ',num2str(ctime)]);
ga = gpuArray(a); % Move data to GPU
gb = gpuArray(b); % Move filter to GPU
gc = fastConvolution_v2(ga, gb); % Calculate on GPU
gtime = gputimeit(@()fastConvolution_v2(ga,gb));% Measure GPU time
gerr = max(max(abs(gather(gc)-c))); % Calculate error
disp(['Execution time on GPU = ',num2str(gtime)]);
disp(['Maximum absolute error = ',num2str(gerr)]);
Execution time on CPU = 0.010393