Technical data

130
Chapter 6: Compiling and Debugging Parallel Fortran
Repeat Step 5: Debug on a Single Processor
Because you are doing sum reductions in parallel, the answers may not
exactly match the original. Be careful to distinguish between real errors and
variations introduced by round-off. In this example, the answers agreed
with the original for 10 digits.
Repeat Step 6: Run the Parallel Version
Again, because of round-off, the answers produced vary slightly depending
on the number of processors used to execute the program. This variation
must be distinguished from any actual error.
Repeat Step 7: Prole the Parallel Version
The output from the pixie run for this routine looks like this:
% prof -pixie -quit 1% try2.mp try2.mp.Addrs
try2.mp.Counts00423
----------------------------------------------------------
* -p[rocedures] using basic-block counts; sorted in *
* descending order by the number of cycles executed in *
* each procedure; unexecuted procedures are excluded *
----------------------------------------------------------
10036679 cycles
cycles %cycles cum % cycles bytes procedure (file)
/call /line
6016033 59.94 59.94 139908 16 mp_waitmaster
(mp_simple_sched.s)
3028682 30.18 90.12 144223 31 _calc_88_aaab
(/tmp/fMPcalc_)
282980 2.82 92.94 14149 58 move_
(/tmp/ctmpa00837)
194040 1.93 94.87 9240 41 calc_
(/tmp/ctmpa00881)
115743 1.15 96.02 137 70 t_putc (lio.c)
With this rewrite, calc_ now accounts for only a small part of the total. You
have pushed most of the work into the parallel region. Because you added a
multiprocessed initialization loop before the main loop, that new loop is