Technical data

Parallel Programming Exercise
125
You have parallelized the distance calculations, leaving the summations to
be done serially. Because you did not alter the order of the summations, this
should produce exactly the same answer as the original version.
Step 5: Debug on a Single Processor
The temptation might be strong to rush the rewritten code directly to the
multiprocessor at this point. Remember, single-process debugging is easier
than multiprocess debugging. Spend time now to compile and correct the
code without the mp ag to save time later.
A few iterations should get it right.
Step 6: Run the Parallel Version
Compile the code with the mp ag. As a further check, do the rst run with
the environment variable MP_SET_NUMTHREADS set to 1. When this
works, set MP_SET_NUMTHREADS to 2, and run the job multiprocessed.
Step 7: Debug the Parallel Version
If you get the correct output from the version with one thread but not from
the version with multiple threads, you need to debug the program while
running multiprocessed. Refer to General Debugging Hints on page 110
for help.
Step 8: Prole the Parallel Version
After the parallel job executes correctly, check whether the run time has
improved. First, compare an execution prole of the modied code compiled
without mp with the original prole. This is important because, in
rewriting the code for parallelism, you may have introduced new work. In
this example, writing and reading the FLAGS array, plus the overhead of the
two new DO loops, are signicant.
The pixie output on the modied code shows the difference:
% prof pixie quit 1% try1 try1.Addrs try1.Counts