HP-MPI User's Guide (11th Edition)
Example applications
multi_par.f
Appendix A252
partitioning used for the parallelization of the first outer-loop can
accommodate the other of the second outer-loop. The partitioning of the
array is shown in Figure A-1.
Figure A-1 Array partitioning
In this sample program, the rank n process is assigned to the partition n
at distribution initialization. Because these partitions are not
contiguous-memory regions, MPI's derived datatype is used to define the
partition layout to the MPI system.
Each process starts with computing summations in row-wise fashion. For
example, the rank 2 process starts with the block that is on the
0th-row block and 2nd-column block (denoted as [0,2]).
The block computed in the second step is [1,3]. Computing the first row
elements in this block requires the last row elements in the [0,3] block
(computed in the first step in the rank 3 process). Thus, the rank 2
process receives the data from the rank 3 process at the beginning of the
second step. Note that the rank 2 process also sends the last row
elements of the [0,2] block to the rank 1 process that computes [1,2] in
the second step. By repeating these steps, all processes finish
summations in row-wise fashion (the first outer-loop in the illustrated
program).
column block
row block
01
2
3
0
1
2
3
0
0
0
0
1
1
1
1
2
2
2
2
3
3
3
3