Technical data
134
Chapter 6: Compiling and Debugging Parallel Fortran
With these final fixes in place, repeat the same steps to verify the changes:
1. Debug on a single processor.
2. Run the parallel version.
3. Debug the parallel version.
4. Profile the parallel version.
Repeat Step 7 Again: Profile
The pixie output for the latest version of the code looks like this:
% prof -pixie -quit 1% try3.mp try3.mp.Addrs
try3.mp.Counts00425
------------------------------------------------------
* -p[rocedures] using basic-block counts; sorted in *
* descending order by the number of cycles executed in *
* each procedure; unexecuted procedures are excluded *
----------------------------------------------------------
7045818 cycles
cycles %cycles cum % cycles bytes procedure (file)
/call /line
5960816 84.60 84.60 283849 31 _calc_2_ (/tmp/fMPcalc_)
282980 4.02 88.62 14149 58 move_ (/tmp/ctmpa00837)
179893 2.75 91.37 4184 16 mp_waitmaster (mp_simple_sched.s)
159978 2.55 93.92 7618 41 calc_ (/tmp/ctmpa00941)
115743 1.64 95.56 137 70 t_putc (lio.c)
This looks good. To be sure you have solved the load-balancing problem,
check that the slave output shows roughly equal amounts of time spent in
_calc_2_. Once this is verified, you are finished.










