Technical data

Parallel Programming Exercise

125

You have parallelized the distance calculations, leaving the summations to

be done serially. Because you did not alter the order of the summations, this

should produce exactly the same answer as the original version.

Step 5: Debug on a Single Processor

The temptation might be strong to rush the rewritten code directly to the

multiprocessor at this point. Remember, single-process debugging is easier

than multiprocess debugging. Spend time now to compile and correct the

code without the –mp ﬂag to save time later.

A few iterations should get it right.

Step 6: Run the Parallel Version

Compile the code with the –mp ﬂag. As a further check, do the ﬁrst run with

the environment variable MP_SET_NUMTHREADS set to 1. When this

works, set MP_SET_NUMTHREADS to 2, and run the job multiprocessed.

Step 7: Debug the Parallel Version

If you get the correct output from the version with one thread but not from

the version with multiple threads, you need to debug the program while

running multiprocessed. Refer to “General Debugging Hints” on page 110

for help.

Step 8: Proﬁle the Parallel Version

After the parallel job executes correctly, check whether the run time has

improved. First, compare an execution proﬁle of the modiﬁed code compiled

without –mp with the original proﬁle. This is important because, in

rewriting the code for parallelism, you may have introduced new work. In

this example, writing and reading the FLAGS array, plus the overhead of the

two new DO loops, are signiﬁcant.

The pixie output on the modiﬁed code shows the difference:

% prof –pixie –quit 1% try1 try1.Addrs try1.Counts