User`s guide

Analyze Large Data Sets in a Database with MapReduce
6-85
Analyze Large Data Sets in a Database with MapReduce
This example shows how to analyze large data sets that are stored in a database. You
can access large data sets using a DatabaseDatastore object with Database Toolbox.
After creating a DatabaseDatastore, you can run algorithms on large data sets by
integrating with MapReduce.
This example uses MapReduce to calculate the mean arrival delay of a large flight
data set that is stored in a database. This example modifies the “Compute Mean
Value with MapReduce” example to use a DatabaseDatastore instead of a
TabularTextDatastore. You can similarly modify other MATLAB examples that
analyze data using MapReduce as described in Building Effective Algorithms with
MapReduce.
Create the DatabaseDatastore
The default output data type of any datastore is a table. Set the database preference
for the data return format 'DataReturnFormat' to table for consistency across data
types.
setdbprefs('DataReturnFormat','table')
The file airlinesmall.csv contains the large flight data set. Load this file into a
MySQL database table named flightdelay. This table contains 123,523 records.
Create a database connection conn using the JDBC driver. Use the Vendor name-
value pair argument of database to specify a connection to a MySQL database. This
code assumes you are connecting to a database named dbname on a database server
named sname with user name username and password pwd. dbname contains the table
flightdelay.
conn = database('dbname','username','pwd',...
'Vendor','MySQL',...
'Server','sname');
Create a DatabaseDatastore object dbds using the database connection conn and SQL
query sqlquery. This SQL query retrieves flight arrival delay data ArrDelay from the
table flightdelay.
sqlquery = 'select ArrDelay from flightdelay';
dbds = datastore(conn,sqlquery);