System information

Intel® Xeon Phi™ Coprocessor DEVELOPER’S QUICK START GUIDE

Step 2: Send the data over to the Intel® Xeon Phi™ Coprocessor using #pragma offload. In this

example, the free_if(0) qualifier is used to make the data persistent on the Intel® Xeon Phi™

Coprocessor.

#define PHI_DEV 0

#pragma offload target(mic:PHI_DEV) \

in(A:length(matrix_elements) free_if(0)) \

in(B:length(matrix_elements) free_if(0)) \

in(C:length(matrix_elements) free_if(0))

{

}

Code Example 14: Sending the Data to the Intel® Xeon Phi™ Coprocessor

Step 3: Call sgemm inside the offload section to use the “Native Acceleration” version of Intel® MKL on

the Intel® Xeon Phi™ Coprocessor. The nocopy() qualifier causes the data copied to the card in step 2

to be reused.

#pragma offload target(mic:PHI_DEV) \

in(transa, transb, N, alpha, beta) \

nocopy(A: alloc_if(0) free_if(0)) nocopy(B: alloc_if(0) free_if(0)) \

out(C:length(matrix_elements) alloc_if(0) free_if(0)) // output data

{

sgemm(&transa, &transb, &N, &N, &N, &alpha, A, &N, B, &N,

&beta, C, &N);

}

Code Example 15: Calling sgemm Inside the Offload Section

Step 4: Free the memory you copied to the card in step 2. The alloc_if(0) qualifier is used to reuse

the data on the card on entering the offload section, and the free_if(1) qualifier is used to free the

data on the card on exit.

#pragma offload target(mic:PHI_DEV) \

in(A:length(matrix_elements) alloc_if(0) free_if(1)) \

in(B:length(matrix_elements) alloc_if(0) free_if(1)) \

in(C:length(matrix_elements) alloc_if(0) free_if(1))

{

}

Code Example 16: Set the Copied Memory Free

As with Intel® MKL on any platform, it is possible to limit the number of threads it uses by setting the number

of allowed OpenMP threads before executing the MKL function within the offloaded code.

#pragma offload target(mic:PHIDEV) \

in(transa, transb, N, alpha, beta) \

nocopy(A: alloc_if(0) free_if(0)) nocopy(B: alloc_if(0) free_if(0))