Technical data
Parallel Programming Exercise
125
You have parallelized the distance calculations, leaving the summations to
be done serially. Because you did not alter the order of the summations, this
should produce exactly the same answer as the original version.
Step 5: Debug on a Single Processor
The temptation might be strong to rush the rewritten code directly to the
multiprocessor at this point. Remember, single-process debugging is easier
than multiprocess debugging. Spend time now to compile and correct the
code without the –mp flag to save time later.
A few iterations should get it right.
Step 6: Run the Parallel Version
Compile the code with the –mp flag. As a further check, do the first run with
the environment variable MP_SET_NUMTHREADS set to 1. When this
works, set MP_SET_NUMTHREADS to 2, and run the job multiprocessed.
Step 7: Debug the Parallel Version
If you get the correct output from the version with one thread but not from
the version with multiple threads, you need to debug the program while
running multiprocessed. Refer to “General Debugging Hints” on page 110
for help.
Step 8: Profile the Parallel Version
After the parallel job executes correctly, check whether the run time has
improved. First, compare an execution profile of the modified code compiled
without –mp with the original profile. This is important because, in
rewriting the code for parallelism, you may have introduced new work. In
this example, writing and reading the FLAGS array, plus the overhead of the
two new DO loops, are significant.
The pixie output on the modified code shows the difference:
% prof –pixie –quit 1% try1 try1.Addrs try1.Counts










