User's Manual
62
Chapter 5
reduce network traffic and speed stream operatio ns. Note that the Generate SQL check bo x
must be selected for SQL optimization to have any effect.
Optimize syntax execution. This method of stream rewr iting increases the efficiency of
operations that incorporate more than one n ode containing IB M® SPSS® Statistics s yntax.
Optimization is achieved by combining the syntax commands into a single operation, instead
of running eac h as a separate operation.
Optimize other execution. This method of stream rewriting increases th e efficiency of
operations that cannot be dele gated to the database. Optimization is achieved by reducing the
amount of data in the stream as early as possible. While maintaining data integrity, the s tream
is rewritte n to push operations close r to the data source, t hus redu cing data downstream for
costly oper
ations, such as joins.
Enable parallel processing. When running on a computer with multiple processors, this option
allows the system to balance th e load acros s those proces sors, which may result in faster
performanc
e. Use of multiple nodes or use of the following individual nodes may benefit from
parallel processing: C5.0, Merge (by key), Sort, Bin (rank and tile methods), and Aggregate
(using one or mor e key fields).
Generate S
QL. Select this option to enable SQL generation, allowing stream o pe r ations to be push ed
back to the database by us ing SQL code to gene r ate execution processes, w hich may imp r ove
performance. To further improve performance,
Optimize SQL generation can also be selecte d to
maximiz e t
he number of operations pushed back to the database. When operations for a node have
been pushed back to the database, the node will be highlighted in purple when th e stream is run.
Database caching. For streams that generate SQL to be executed in the database, data c an be
cached mi
dstream to a temporary table in the data base rather than to the file system. When
combined with SQL optimization, this may result in significant gains in performance. For
example, the output from a stream that merges multiple tables to create a data mining view
may be cached a nd reused a s needed. With d atabase caching enabled, sim ply right-click any
nonterminal node to cache data at that point, and the cache is automatically crea ted directly in
the database the next time the stream is run. This allows SQL to be generated for downstream
nodes, f
urther improving performance. Alternatively, this option can be disabled if needed,
such as when policies or permissions preclude data being written to the database. If database
caching or SQL optimization is not enabled, the cache w ill be written to the file system
instead. F or more information , se e the topic Caching Opt ions for Nodes on p. 50.
Use re
laxed conversion. This option enables the conversion of data from either strings to
numbers, or numbe r s to strings, if s tored in a suitable format. For example, if the data is
kept in the database as a string, but actually contains a meaningful number, the d ata can be
converted for use when the pushback occurs.
Note: Due to minor differences in SQL impleme ntation, streams run in a database may return
slightly di fferent results from those returned when run in SPSS Modeler. F or similar reas ons, these
differences may also vary dependin g on the database vendor.
Save As Default. The options specified apply only to the current stream. Click this button to set
these options as the default for all streams.