Technical data
Parallel Programming Exercise
131
now named _calc_88_aaaa and the main loop is now _calc_88_aaab. The
initialization took less than 1 percent of the total time and so does not even
appear on the listing.
The large number for the routine mp_waitmaster indicates a problem. Look
at the pixie run for the slave process
% prof -pixie -quit 1% try2.mp try2.mp.Addrs
try2.mp.Counts00424
----------------------------------------------------------
* -p[rocedures] using basic-block counts; sorted in *
* descending order by the number of cycles executed in *
* each procedure; unexecuted procedures are excluded *
----------------------------------------------------------
10704474 cycles
cycles %cycles cum % cycles bytes procedure (file)
/call /line
7701642 71.95 71.95 366745 31 _calc_2_
(/tmp/fMPcalc_)
2909559 27.18 99.13 67665 32
mp_slave_wait_for_work (mp_slave.s)
The slave is spending more than twice as many cycles in the main
multiprocessed loop as the master. This is a severe load balancing problem.
Repeat Step 3 Again: Analyze
Examine the loop again. Because the inner loop goes from 1 to I-1, the first
few iterations of the outer loop have far less work in them than the last
iterations. Try breaking the loop into interleaved pieces rather than
contiguous pieces. Also, because the PARTIAL array should have the
leftmost index vary the fastest, flip the order of the dimensions. For fun, we
will put some loop unrolling in the initialization loop. This is a marginal
optimization because the initialization loop is less than 1 percent of the total
execution time.










