User`s guide

6 Programming Overview
6-64
Partition a Datastore in Parallel
Partitioning a datastore in parallel, with a portion of the datastore on each worker in a
parallel pool, can provide benefits in many cases:
Perform some action on only one part of the whole datastore, or on several defined
parts simultaneously.
Search for specific values in the data store, with all workers acting simultaneously on
their own partitions.
Perform a reduction calculation on the workers across all partitions.
This example shows how to use partition to parallelize the reading of data from a
datastore. It uses a small datastore of airline data provided in MATLAB, and finds the
mean of the non-NaN values from its 'ArrDelay' column.
A simple way to calculate the mean is to divide the sum of all the non-NaN values by the
number of non-NaN values. The following code does this for the datastore first in a non-
parallel way. To begin, you define a function to amass the count and sum. If you want
to run this example, copy and save this function in a folder on the MATLAB command
search path.
function [total,count] = sumAndCountArrivalDelay(ds)
total = 0;
count = 0;
while hasdata(ds)
data = read(ds);
total = total + sum(data.ArrDelay,1,'OmitNaN');
count = count + sum(~isnan(data.ArrDelay));
end
end
The following code creates a datastore, calls the function, and calculates the mean
without any parallel execution. The tic and toc functions are used to time the
execution, here and in the later parallel cases.
ds = datastore(repmat({'airlinesmall.csv'},20,1),'TreatAsMissing','NA');
ds.SelectedVariableNames = 'ArrDelay';
reset(ds);
tic