User`s guide

Run mapreduce on a Parallel Pool

6-57

Run mapreduce on a Parallel Pool

In this section...

“Start Parallel Pool” on page 6-57

“Compare Parallel mapreduce” on page 6-57

Start Parallel Pool

If you have Parallel Computing Toolbox installed, execution of mapreduce can open

a parallel pool on the cluster specified by your default profile, for use as the execution

environment.

You can set your parallel preferences so that a pool does not automatically open. In this

case, you must explicitly start a pool if you want mapreduce to use it for parallelization

of its work. See “Parallel Preferences”.

For example, the following conceptual code starts a pool, and some time later uses that

open pool for the mapreducer configuration.

p = parpool('local',n);

mr = mapreducer(p);

outds = mapreduce(tds,@MeanDistMapFun,@MeanDistReduceFun,mr)

Note mapreduce can run on any cluster that supports parallel pools. The examples

in this topic use a local cluster, which works for all Parallel Computing Toolbox

installations.

Compare Parallel mapreduce

The following example calculates the mean arrival delay from a datastore of airline data.

First it runs mapreduce in the MATLAB client session, then it runs in parallel on a local

cluster. The mapreducer function explicitly controls the execution environment.

Begin by starting a parallel pool on a local cluster.

p = parpool('local',4);

Starting parallel pool (parpool) using the 'local' profile ... connected to 4 workers.