User`s guide
2 Parallel for-Loops (parfor)
2-42
Improving parfor Performance
Where to Create Arrays
With a parfor-loop, it might be faster to have each MATLAB worker create its own
arrays or portions of them in parallel, rather than to create a large array in the client
before the loop and send it out to all the workers separately. Having each worker create
its own copy of these arrays inside the loop saves the time of transferring the data from
client to workers, because all the workers can be creating it at the same time. This might
challenge your usual practice to do as much variable initialization before a for-loop as
possible, so that you do not needlessly repeat it inside the loop.
Whether to create arrays before the parfor-loop or inside the parfor-loop depends on
the size of the arrays, the time needed to create them, whether the workers need all or
part of the arrays, the number of loop iterations that each worker performs, and other
factors. While many for-loops can be directly converted to parfor-loops, even in these
cases there might be other issues involved in optimizing your code.
Slicing Arrays
If a variable is initialized before a parfor-loop, then used inside the parfor-loop, it has
to be passed to each MATLAB worker evaluating the loop iterations. Only those variables
used inside the loop are passed from the client workspace. However, if all occurrences of
the variable are indexed by the loop variable, each worker receives only the part of the
array it needs.
Optimizing on Local vs. Cluster Workers
Running your code on local workers might offer the convenience of testing your
application without requiring the use of cluster resources. However, there are certain
drawbacks or limitations with using local workers. Because the transfer of data does not
occur over the network, transfer behavior on local workers might not be indicative of how
it will typically occur over a network.
With local workers, because all the MATLAB worker sessions are running on the same
machine, you might not see any performance improvement from a parfor-loop regarding
execution time. This can depend on many factors, including how many processors and
cores your machine has. You might experiment to see if it is faster to create the arrays