User`s guide

Run mapreduce on a Parallel Pool

6-59

readall(meanDelay)

Key Value

__________________ ________

'MeanArrivalDelay' [7.1201]

Then, run the calculation on the current parallel pool. Note that the output text indicates

a parallel mapreduce.

meanDelay = mapreduce(ds,@meanArrivalDelayMapper,@meanArrivalDelayReducer,inPool);

Parallel mapreduce execution on the parallel pool:

********************************

* MAPREDUCE PROGRESS *

********************************

Map 0% Reduce 0%

Map 100% Reduce 50%

Map 100% Reduce 100%

readall(meanDelay)

Key Value

__________________ ________

'MeanArrivalDelay' [7.1201]

With this relatively small data set, a performance improvement with the parallel pool is

not likely. This example is to show the mechanism for running mapreduce on a parallel

pool. As the data set grows, or the map and reduce functions themselves become more

computationally intensive, you might expect to see improved performance with the

parallel pool, compared to running mapreduce in the MATLAB client session.

Note When running parallel mapreduce on a cluster, the order of the key-value pairs in

the output is different compared to running mapreduce in MATLAB. If your application

depends on the arrangement of data in the output, you must sort the data according to

your own requirements.