System information
Intel® Xeon Phi™ Coprocessor DEVELOPER’S QUICK START GUIDE
27
Step 2: Send the data over to the Intel® Xeon Phi™ Coprocessor using #pragma offload. In this
example, the free_if(0) qualifier is used to make the data persistent on the Intel® Xeon Phi™
Coprocessor.
#define PHI_DEV 0
#pragma offload target(mic:PHI_DEV) \
in(A:length(matrix_elements) free_if(0)) \
in(B:length(matrix_elements) free_if(0)) \
in(C:length(matrix_elements) free_if(0))
{
}
Code Example 14: Sending the Data to the Intel® Xeon Phi™ Coprocessor
Step 3: Call sgemm inside the offload section to use the “Native Acceleration” version of Intel® MKL on
the Intel® Xeon Phi™ Coprocessor. The nocopy() qualifier causes the data copied to the card in step 2
to be reused.
#pragma offload target(mic:PHI_DEV) \
in(transa, transb, N, alpha, beta) \
nocopy(A: alloc_if(0) free_if(0)) nocopy(B: alloc_if(0) free_if(0)) \
out(C:length(matrix_elements) alloc_if(0) free_if(0)) // output data
{
sgemm(&transa, &transb, &N, &N, &N, &alpha, A, &N, B, &N,
&beta, C, &N);
}
Code Example 15: Calling sgemm Inside the Offload Section
Step 4: Free the memory you copied to the card in step 2. The alloc_if(0) qualifier is used to reuse
the data on the card on entering the offload section, and the free_if(1) qualifier is used to free the
data on the card on exit.
#pragma offload target(mic:PHI_DEV) \
in(A:length(matrix_elements) alloc_if(0) free_if(1)) \
in(B:length(matrix_elements) alloc_if(0) free_if(1)) \
in(C:length(matrix_elements) alloc_if(0) free_if(1))
{
}
Code Example 16: Set the Copied Memory Free
As with Intel® MKL on any platform, it is possible to limit the number of threads it uses by setting the number
of allowed OpenMP threads before executing the MKL function within the offloaded code.
#pragma offload target(mic:PHIDEV) \
in(transa, transb, N, alpha, beta) \
nocopy(A: alloc_if(0) free_if(0)) nocopy(B: alloc_if(0) free_if(0))