User Guide

Processing options 111
For example, if you want to use a script called fix_bif to add customized information to BIF files,
use the following command:
vspider -cmdfile filename
Where filename is the text-only command file that contains the following (along with any other
necessary options):
-processbif 'fix_bif !*'
Your command file will include other options as well.
-regexp
Specifies the use of regular expressions rather than the default wildcard expressions for the
following options:
-exclude, -indexclude, -include, -indinclude, -skip, -indskip,
-preferred, and -nofollow.
Wildcard expressions allow the use of the asterisk (*) for text strings, and the question mark (?) for
single characters, as the following table shows:
Regular expressions allow for more powerful and flexible matching of alphanumeric strings; for
example, to match "ab11" or "ab34" but not "abcd" or "ab11cd," you could use the following
regular expression:
^ab[0-9][0-9]$
The full extent to which regular expressions can be employed is beyond the scope of this
description. For more information on regular expressions, refer to a book devoted to the subject.
-submitsize
Syntax:
-submitsize num_documents
Specifies the number of documents submitted for indexing at one time. The default value is 128.
The upper limit is 64,000.
Note: Although larger values mean more efficient processing by the indexer, smaller values allow
more parallelism on multi-CPU systems. In the event of a halt during indexing, a smaller value means
fewer documents will be lost.
If a halt occurs during indexing, the chunk of documents specified by the -submitsize option is
lost because there is no transactional rollback for indexing and the documents are no longer in the
queue for indexing. When you rerun the indexing task, Verity Spider can only continue with
URLs and documents that are enqueued.
Wildcard expression Text string
a*t although, attitude, audit
a?t ant, art
file?.htm files.htm, file1.htm, filer.htm
name?.* names.txt, named.blank, names.ext