System information
Intel® Xeon Phi™ Coprocessor DEVELOPER’S QUICK START GUIDE
17
For csh – setenv H_TRACE 2
For sh – export H_TRACE=2
To print the compiler’s internal offload timers, a value of 1 reports just the time the offload took
measured by the host, and the amount of computation time done by the coprocessor. A value of
2 adds information on how much data was transferred in either direction.
For csh – setenv OFFLOAD_REPORT <1 or 2>
For sh – export OFFLOAD_REPORT=<1 or 2>
Details can be found in the compiler documentation in the “Compilation/Setting Environment Variables” section.
Where To Get More Help
You can visit the Forum on the Intel® Xeon Phi™ Coprocessor to post questions. It can be found at the
http://software.intel.com/en-us/forums/intel-many-integrated-core .
Using the Offload Compiler – Explicit Memory Copy Model
In this section, a reduction is used as an example to show a step-by-step approach for developing applications
for the Intel® Xeon Phi™ Coprocessor using the offload compiler. The offload compiler is a heterogeneous
2
compiler, with both host CPU and target compilation environments. Code for both the host CPU and Intel® Xeon
Phi™ coprocessor is compiled within the host environment, and offloaded code is automatically run within the
target environment. The offload behavior is controlled by compiler directives: pragmas in C/C++, and
directives in Fortran.
Some common libraries, such as the Intel® Math Kernel Library (Intel® MKL), are available in host versions as
well as target versions. When an application executes its first offload and the target is available, the runtime
loads the target executable onto the Intel® Xeon Phi™ Coprocessor. At this time, it also initializes the libraries
linked with the target code. The loaded target executable remains in the target memory until the host
program terminates. Thus, any global state maintained by the library is maintained across offload instances.
Note: Although, the user may specify the region of code to run on the target, there is no guarantee of
execution on the Intel® Xeon Phi™ Coprocessor. Depending on the presence of the target hardware or the
availability of resources on the Intel® Xeon Phi™ Coprocessor when execution reaches the region of code
marked for offload, the code can run on the Intel® Xeon Phi™ Coprocessor or may fall back to executing on the
host.
The following code samples show several versions of porting reduction code to the Intel® Xeon Phi™
Coprocessor using the offload pragma directive.
Reduction
The operation refers to computing the expression:
ans = a[0] + a[1] + … + a[n-1]
2
http://dictionary.reference.com/browse/heterogeneous