user manual

ManualsBrandsIBM ManualsSwitchSwitch 15

241

242

243

244

245

246

247

248

249

250

231

Performance Considerations for Streams and Nodes

The following operations cannot be performed in most databases. They should be placed in the

stream after t

he operations in the preceding list:

 Operations on an y nondatabase data, such as ﬂat ﬁles

 Merge by orde

 Balance

 Distinct ope

rations in discard mode or where only a subset of ﬁelds are selected as distinct

 Any operation that requir es acc essing data fro m records other than the one being processed

 State and cou

nt ﬁeld deriv ations

 History node operations

 Operations i

nvolving “@” (time-series) function s

 Type-ch ecking modes Warn and Abort

 Model constru

ction, applica tion, and analysis

Note: Decision trees, r ulesets, linear regression, and f actor-generated models can generate

SQL and can the

refore be pushed back to the database.

 Data output to anywhere other than the same database that is proces sing the dat a

Node Caches

To o ptimize stream running, you can set up a cache on any nonterminal node . When you set up a

cache on a node

, the cache is ﬁlled with the data that passes through t he node the next time you

run the data stream. From then on, the data is read from the cac he (which is stored on disk in a

temporary directory) rather than from the data source.

Caching is mo

st useful following a time-consuming operation such as a sort, merge, or

aggregation. For example, suppose that you have a source node s et to read sales data from a

database an d an Aggregate node that summarizes sales by locat ion. You can set up a cache on the

Aggregate n

ode rather than on the source node because you want the cache to store the aggregated

data rather than the entire data set.

Note: Caching at source nodes, which simply stores a copy of the original data as it is read into

IBM® SPSS® M

odeler, will n ot improve performance in m ost circumstances.

Nodes with caching enabled are displayed with a small document icon at the top right corner.

When the data is cached at the node, the document icon is green.