System information
Intel® Xeon Phi™ Coprocessor DEVELOPER’S QUICK START GUIDE
20
APIs for Dynamic Aligned Shared memory allocation
void *_Offload_shared_aligned_malloc(size_t size, size_t alignment);
_Offload_shared_aligned_free(void *p);
It should be noted that this is not actually “shared memory”: there is no hardware that maps some portion of
the memory on the Intel® Xeon Phi™ Coprocessor to the host system. The memory subsystems on the
coprocessor and host are completely independent, and this programming model is just a different way of
copying data between these memory subsystems at well-defined synchronization points. The copying is
implicit, in that at these synchronization points (offload calls marked with _Cilk_offload) do not specify
what data to copy. Rather, the runtime determines what data has changed between the host and coprocessor,
and copies only the deltas at the beginning and end of the offload function call.
The following code sample demonstrates the use of the _Cilk_shared and _Cilk_offload keywords
and the dynamic allocation of “shared” memory.
float * _Cilk_shared data; //pointer to “shared” memory
_Cilk_shared float MIC_OMPReduction(int size)
{
#ifdef __MIC__
float Result;
int nThreads = 32;
omp_set_num_threads(nThreads);
#pragma omp parallel for reduction(+:Result)
for (int i=0; i<size; ++i)
{
Result += data[i];
}
return Result;
#else
printf("Intel(R) Xeon Phi(TM) Coprocessor not available\n");
#endif
return 0.0f;
}
int main()
{
size_t size = 1*1e6;
int n_bytes = size*sizeof(float);
data = (_Cilk_shared float *)_Offload_shared_malloc (n_bytes);
for (int i=0; i<size; ++i)
{
data[i] = i%10;
}
_Cilk_offload MIC_OMPReduction(size);
_Offload_shared_free(data);
return 0;
}