Brochure
9
CUDA exploits the massively parallel char-
acteristics of NVIDIA’s ubiquitous silicon.
CUDA is taught in universities worldwide
and used in many R&D labs, so a large
number of programmers are available and
there is a wealth of web-based resources.
CUDA software development tools:
• NVIDIACCompilerforparallelGPUcode
• CUDADebugger
• CUDAVisualProfiler
• SDKwithbest-practiceguides
• ParallelNsight
®
Advanced libraries that include:
• NVIDIAPerformancePrimitives(image
and video)
• BasicLinearAlgebraSubprograms
• FFT
• VSIPL
GE GPGPU products also support Open
Computer Language (OpenCL), the first
open language for writing programs that
execute across CPUs, GPUs, and other
processors. It includes a language for
writing kernels, defines APIs, and provides
parallel computing using task-based and
data-based parallelism.
Open Graphics Library (OpenGL) is a
standard specification defining a cross-
language, cross-platform API for writing
applications that produce 2D and 3D
computer graphics. This is used in the
graphics output processes.
C for CUDA extends C by allowing the programmer to define C functions, called kernels, that
when called are executed N times in parallel by N different CUDA threads, as opposed to
only once like regular C functions.
A kernel is defined using the __global__ declaration specifier and the number of CUDA
threads for each call is specified using a new <<<…>>> syntax:
__global__ void vecAdd(float* A, float* B, float* C)
{
int i = threadIdx.x;
C[i] = A[i] + B[i];
}
int main()
{
//Kernelinvocation
vecAdd<<<1, N>>>(A, B, C);
}
Main
Memory
CPU
Memory
for GPU
Instruct the
processing
Copy processing
data
Copy the result
Execute parallel
in each core
GPU (GeForce 8800)
PROCESSING FLOW ON CUDA