System information

Intel® Xeon Phi™ Coprocessor DEVELOPER’S QUICK START GUIDE

For csh – setenv H_TRACE 2

For sh – export H_TRACE=2

 To print the compiler’s internal offload timers, a value of 1 reports just the time the offload took

measured by the host, and the amount of computation time done by the coprocessor. A value of

2 adds information on how much data was transferred in either direction.

For csh – setenv OFFLOAD_REPORT <1 or 2>

For sh – export OFFLOAD_REPORT=<1 or 2>

Details can be found in the compiler documentation in the “Compilation/Setting Environment Variables” section.

Where To Get More Help

You can visit the Forum on the Intel® Xeon Phi™ Coprocessor to post questions. It can be found at the

http://software.intel.com/en-us/forums/intel-many-integrated-core .

Using the Offload Compiler – Explicit Memory Copy Model

In this section, a reduction is used as an example to show a step-by-step approach for developing applications

for the Intel® Xeon Phi™ Coprocessor using the offload compiler. The offload compiler is a heterogeneous

compiler, with both host CPU and target compilation environments. Code for both the host CPU and Intel® Xeon

Phi™ coprocessor is compiled within the host environment, and offloaded code is automatically run within the

target environment. The offload behavior is controlled by compiler directives: pragmas in C/C++, and

directives in Fortran.

Some common libraries, such as the Intel® Math Kernel Library (Intel® MKL), are available in host versions as

well as target versions. When an application executes its first offload and the target is available, the runtime

loads the target executable onto the Intel® Xeon Phi™ Coprocessor. At this time, it also initializes the libraries

linked with the target code. The loaded target executable remains in the target memory until the host

program terminates. Thus, any global state maintained by the library is maintained across offload instances.

Note: Although, the user may specify the region of code to run on the target, there is no guarantee of

execution on the Intel® Xeon Phi™ Coprocessor. Depending on the presence of the target hardware or the

availability of resources on the Intel® Xeon Phi™ Coprocessor when execution reaches the region of code

marked for offload, the code can run on the Intel® Xeon Phi™ Coprocessor or may fall back to executing on the

host.

The following code samples show several versions of porting reduction code to the Intel® Xeon Phi™

Coprocessor using the offload pragma directive.

Reduction

The operation refers to computing the expression:

ans = a[0] + a[1] + … + a[n-1]

http://dictionary.reference.com/browse/heterogeneous