Installation manual

39
3.3.2.1.1.1 Samples
Use cache memory - Enables usage of a fingerprint cache (Enabled by default).
Turn on MSF - Allows for use of an alternate fingerprinting algorithm known as MSF. When enabled, you will be
able to set following limits and thresholds:
Number of messages designating a bulk message: - This option specifies how many similar messages are
required in order to consider a message bulk.
Frequency of clearing cache memory: - This option specifies an internal variable which determines how
frequently the in-memory MSF cache is pruned.
Two samples match sensitivity: - This option specifies the match percentage threshold for two fingerprints. If
the match percentage is higher than this threshold then messages are considered to be the same.
Number of samples stored in memory: - This option specifies the number of MSF fingerprints to keep in
memory. The higher the number, the more memory is used but also the higher the accuracy.
3.3.2.1.1.2 SpamCompiler
Turn on SpamCompiler - Speeds up rules processing but requires a little bit more memory.
Preffered version: - Specifies what SpamCompiler version to use. When set to Automatic, the antispam engine
will choose the best engine to use.
Use cache memory - If this option is enabled, SpamCompiler will store the compiled data on disk instead of
memory in order to reduce memory usage.
List of cache memory files: - This option specifies which rules files are compiled on disk instead of memory.
Set rule files indexes which will be stored in cache memory on disk. To manage rule file indexes you can:
Add...
Edit..
Remove
NOTE: Only numbers are acceptable characters.
3.3.2.1.2 Training
Use training for message fingerprint score - Enables fingerprint score offset training.
Use training words - This option controls whether Bayesian Word Token analysis is used. Accuracy can be greatly
improved but more memory is used and it is slightly slower.
Number of words in cache memory: - This option specifies the number of word tokens to cache at any time. The
higher the number, the more memory is used but also the higher the accuracy. To enter the number, enable
option Use training words first.
Use training database only for reading: - This option controls whether the word, rules, and fingerprint training
databases can be modified or are read-only after the initial load. A read-only training database is faster.
Automatic training sensitivity: - Sets a threshold for auto-training. If a message is scored at or above the high
threshold, that message is considered a definite spam and is then used to train all the enabled Bayesian modules
(rules and/or word) but not sender or fingerprint. If a message is scored at or below the low threshold, that
message is considered a definite ham and is then used to train all the enabled Bayesian modules (rules and/or
word) but not sender or fingerprint. To enter the high and low threshold number, enable option Use training
database only for reading: first.
Minimum amount of training data: - Initially, only the rule weights are used to compute the spam score. Once a
minimum set of training data is achieved, rule/word training data replaces the rule weights. The default minimum
is 100 which means that it must be trained on at least 100 equivalent known ham messages and 100 equivalent
spam messages for a total of 200 messages before the training data replaces the rule weights. If the number is too
low then the accuracy could be poor due to insufficient data. If the number is too high, then the training data will
not be fully taken advantage of. A value of 0 will cause rule weights to always be ignored.
Use only training data - Controls whether to give full weight to training data. If this option is enabled then scoring
will be based solely on training data. If this option is disabled (unchecked) then both rules and training data will be
used.