User`s guide

Run mapreduce on a Parallel Pool
6-57
Run mapreduce on a Parallel Pool
In this section...
“Start Parallel Pool” on page 6-57
“Compare Parallel mapreduce” on page 6-57
Start Parallel Pool
If you have Parallel Computing Toolbox installed, execution of mapreduce can open
a parallel pool on the cluster specified by your default profile, for use as the execution
environment.
You can set your parallel preferences so that a pool does not automatically open. In this
case, you must explicitly start a pool if you want mapreduce to use it for parallelization
of its work. See “Parallel Preferences”.
For example, the following conceptual code starts a pool, and some time later uses that
open pool for the mapreducer configuration.
p = parpool('local',n);
mr = mapreducer(p);
outds = mapreduce(tds,@MeanDistMapFun,@MeanDistReduceFun,mr)
Note mapreduce can run on any cluster that supports parallel pools. The examples
in this topic use a local cluster, which works for all Parallel Computing Toolbox
installations.
Compare Parallel mapreduce
The following example calculates the mean arrival delay from a datastore of airline data.
First it runs mapreduce in the MATLAB client session, then it runs in parallel on a local
cluster. The mapreducer function explicitly controls the execution environment.
Begin by starting a parallel pool on a local cluster.
p = parpool('local',4);
Starting parallel pool (parpool) using the 'local' profile ... connected to 4 workers.