System information

Intel® Xeon Phi Coprocessor DEVELOPERS QUICK START GUIDE
22
ulimit s unlimited
7. Go to /tmp and run a.out:
cd /tmp
./a.out
Parallel Programming Options on the Intel® Xeon Phi™ Coprocessor
Most of the parallel programming options available on the host systems are available for the Intel® Xeon Phi™
Coprocessor. These include the following:
1. Intel Threading Building Blocks (Intel® TBB)
2. OpenMP*
3. Intel® Cilk Plus
4. pthreads*
The following sections will discuss the use of these parallel programming models in code using the offload
extensions. Code that runs natively on the Intel® Xeon Phi™ Coprocessor can use these parallel programming
models just as they would on the host, with no unusual complications beyond the larger number of threads.
Parallel Programming on the Intel® Xeon Phi™ Coprocessor: OpenMP*
There is no correspondence between OpenMP threads on the host CPU and on the Intel® Xeon Phi™
Coprocessor. Because an OpenMP parallel region within an offload/pragma is offloaded as a unit, the offload
compiler creates a team of threads based on the available resources on Intel® Xeon Phi™ Coprocessor. Since
the entire OpenMP construct is executed on the Intel® Xeon Phi™ coprocessor, within the construct the usual
OpenMP* semantics of shared and private data apply.
Multiple host CPU threads can offload to the Intel® Xeon Phi™ coprocessor at any time. If a CPU thread
attempts to offload to the Intel® Xeon Phi™ Coprocessor and resources are not available on the coprocessor,
the code meant to be offloaded may be executed on the host. When a thread on the coprocessor reaches the
“omp parallel” directive, it creates a team of threads based on the resources available on the coprocessor. The
theoretical maximum number of hardware threads that can be created is 4 times the number of cores in your
Intel® Xeon Phi™ Coprocessor. The practical limit is four less than this (for offloaded code) because the first
core is reserved for the uOS and its services.
The code shown below is an example of a single host CPU thread attempting to offload the reduction code to
the Intel® Xeon Phi™ Coprocessor using OpenMP in the offload construct.
float OMP_reduction(float *data, int size)
{
float ret = 0;
#pragma offload target(mic) in(size) in(data:length(size))
{
#pragma omp parallel for reduction(+:ret)
for (int i=0; i<size; ++i)
{
ret += data[i];