User's Manual

Chapter
1
11
13
33
3
Performance Considerations for
Streams and N
odes
You can design you r streams to maximiz e performance by arranging the nodes in the most
efcient con
guration, by enabling node caches when appropriate, and by paying attention to
other considera tions as detailed in this section.
Aside from the considerations d is cussed here, additional and more substantial perform an ce
improv emen
ts can typically be gained by making effective use of your database, p articularly
through SQ L optimization.
Order of Nod
es
Even when you are not usi ng SQL optimization, the order of nodes in a stream can affect
performance. The gen eral goal is to minimize downstream processing; therefore, when you
have nodes that reduce the amount of data, place them near the beginning of the stream. IBM®
SPSS® Modeler Server can apply some reordering rules automatical ly during compilation to bring
forward certain nodes when it can be proven safe to do so. (This feature is enabled by defa ult.
Check with your syste m administrator to make sure it is enab led in your installation.)
When using SQL optimization, you want to maximize its availability and efciency. Since
optimiza tion halts when the stream contains an operation that cannot be performed in the dat abase,
it is best to group SQL-optimized operations together at the beginning of the stream . Thi s strategy
keeps more of the proces sing in the database, so less data is carried into IBM® SPSS® Modeler.
The following operations can be don e in most databases. Try to group them at the beginning of
the stream:
Merge by key ( join)
Select
Aggre gate
Sort
Sampl e
Append
Distinct operations i n include mode, in which all e lds are selected
Filler operations
Basic derive operations using standard arith metic or string manipulatio n (depending on wh ich
operations are supported by the databa se)
Set-to-ag
© Copyright IBM Corporation 1994, 2012.
230