User's Manual

ManualsBrandsIBM ManualsSwitchIBM IBM Switch 15

241

242

243

244

245

246

247

248

249

250

234

Chapter 13

Aggregate. When the Keys are contiguous option is not set, this node reads ( but does not store) its

entire input d

ata set before it produces any aggre gated output. In the more extre me s ituations,

where the size of the aggregated data reaches a limit (determined by the SPSS Modeler Se r ver

conﬁguration option Memory usage multiplier), the remainder of the data set is sorted and

processed a s

if the Keys are contiguous option were set. When this option is set, no data is stored

because the aggregated outp ut records are pro duced as the input data is read.

Distinct. The Dis tinct node s tores all of the unique key ﬁelds in the input data set; in cases where

all ﬁelds are key ﬁelds and all reco r ds are unique it stores the entire data set. By default the

Distinc t node sorts the data on the key ﬁelds and then sel ects (or discards) the ﬁrst distinct record

from each group. For smaller data sets with a low number of dis tinct keys, or those that have been

pre-sor ted, you can cho ose options to improve the speed and e fﬁciency of proce ssing.

Type. In some instances, the Type node cache s the inpu t data when reading values; the cache is

used for downstream processing. The cache require s sufﬁcient disk space to store the entire

data set but speeds up processing.

Evaluation. The Evalu ation node must sort the input data to compute tile s. The sort is repeated for

each model evaluated because the scores and consequent recor d order are different in each case.

The running time is M*N*log(N), where M is the numb er of models and N is the number of r ecords.

Performance: Modeling Nodes

Neural Net and Kohonen. Neural network training algor ithms (including the Kohonen algorithm)

make many passes over the training data. The data is stored in memory up to a limit, and the

excess is spilled to disk. Accessing the training data fr om disk is expensive because the access

method is random, which can lead to e xc essive disk activity. You can disable the use of disk

storage for these algorithms, forcing all data to be stored in memory, by selecting the

Optimize

for speed option on the Model tab of the node’s dialog box. Note that if the amount of memory

require d to store the data is gre ater than the working set of t he server process, part of it will be

paged to dis k and performance will suffer ac co r dingly.

When Optimize for memory is enabled, a percentage of physical RAM is allocated to the

algorithm according to the value of the I B M® SPSS® Mod eler Server conﬁgu r ation option

Modeling memory limit percentage. To use more m emory for training neural networks, either

provide more RAM or increase the value of this option, but note that setting the value too high

will cause paging.

The running time of the neural network algorithms depends on the required level of accuracy.

You can control the running time by setting a stopping condition in the node’s dialog box.

K-Means. The K-Means clustering algorithm has the same options for controlling memory usage

as the neural netw ork algorithms. Performa nce o n data stored on disk is better, h owever, because

access to the data is sequential.

Performance: CLEM Expressions

CLEM sequence functions (“@ functions”) that look back into the data stream must sto r e

enough of the data to satisfy the longest lo ok-back. For operations whose degree of look-b ack

is unbou nded, all values of the ﬁeld must be stored. An unbound ed operation is one where the