Technical data
Profiling a Parallel Fortran Program
109
After linking, the resulting executable can be run like any standard
executable. Creating multiple execution threads, running and
synchronizing them, and task terminating are all handled automatically.
When an executable has been linked with –mp, the Fortran initialization
routines determine how many parallel threads of execution to create. This
determination occurs each time the task starts; the number of threads is not
compiled into the code. The default is to use the number of processors that
are on the machine (the value returned by the system call
sysmp(MP_NAPROCS); see the sysmp(2) man page). The default can be
overridden by setting the shell environment variable
MP_SET_NUMTHREADS. If it is set, Fortran tasks will use the specified
number of execution threads regardless of the number of processors
physically present on the machine. MP_SET_NUMTHREADS can be an
integer from 1 to 16.
Profiling a Parallel Fortran Program
After converting a program, you need to examine execution profiles to judge
the effectiveness of the transformation. Good execution profiles of the
program are crucial to help you focus on the loops consuming the most time.
IRIX provides profiling tools that can be used on Fortran parallel programs.
Both pixie(1) and pc-sample profiling can be used. On jobs that use multiple
threads, both these methods will create multiple profile data files, one for
each thread. The standard profile analyzer prof(1) can be used to examine
this output.
The profile of a Fortran parallel job is different from a standard profile. As
mentioned in “Analyzing Data Dependencies for Multiprocessing” on page
79, to produce a parallel program, the compiler pulls the parallel DO loops
out into separate subroutines, one routine for each loop. Each of these loops
is shown as a separate procedure in the profile. Comparing the amount of
time spent in each loop by the various threads shows how well the workload
is balanced.










