System information

Intel® Xeon Phi™ Coprocessor DEVELOPER’S QUICK START GUIDE

ulimit –s unlimited

7. Go to /tmp and run a.out:

cd /tmp

./a.out

Parallel Programming Options on the Intel® Xeon Phi™ Coprocessor

Most of the parallel programming options available on the host systems are available for the Intel® Xeon Phi™

Coprocessor. These include the following:

1. Intel Threading Building Blocks (Intel® TBB)

2. OpenMP*

3. Intel® Cilk Plus

4. pthreads*

The following sections will discuss the use of these parallel programming models in code using the offload

extensions. Code that runs natively on the Intel® Xeon Phi™ Coprocessor can use these parallel programming

models just as they would on the host, with no unusual complications beyond the larger number of threads.

Parallel Programming on the Intel® Xeon Phi™ Coprocessor: OpenMP*

There is no correspondence between OpenMP threads on the host CPU and on the Intel® Xeon Phi™

Coprocessor. Because an OpenMP parallel region within an offload/pragma is offloaded as a unit, the offload

compiler creates a team of threads based on the available resources on Intel® Xeon Phi™ Coprocessor. Since

the entire OpenMP construct is executed on the Intel® Xeon Phi™ coprocessor, within the construct the usual

OpenMP* semantics of shared and private data apply.

Multiple host CPU threads can offload to the Intel® Xeon Phi™ coprocessor at any time. If a CPU thread

attempts to offload to the Intel® Xeon Phi™ Coprocessor and resources are not available on the coprocessor,

the code meant to be offloaded may be executed on the host. When a thread on the coprocessor reaches the

“omp parallel” directive, it creates a team of threads based on the resources available on the coprocessor. The

theoretical maximum number of hardware threads that can be created is 4 times the number of cores in your

Intel® Xeon Phi™ Coprocessor. The practical limit is four less than this (for offloaded code) because the first

core is reserved for the uOS and its services.

The code shown below is an example of a single host CPU thread attempting to offload the reduction code to

the Intel® Xeon Phi™ Coprocessor using OpenMP in the offload construct.

float OMP_reduction(float *data, int size)

{

float ret = 0;

#pragma offload target(mic) in(size) in(data:length(size))

{

#pragma omp parallel for reduction(+:ret)

for (int i=0; i<size; ++i)

{

ret += data[i];