User Guide
158 Chapter 8 Verity Spider
-submitsize
Syntax: -submitsize num_documents
Specifies the number of documents submitted for indexing at one time. The default
value is 128. The upper limit is 64,000.
Note
Although larger values mean more efficient processing by the indexer, smaller values
will allow more parallelism on multi-CPU systems. Furthermore, in the event of a
halt during indexing, a smaller value means fewer documents will be lost.
If a halt occurs during indexing, the chunk of documents specified by -submitsize is
lost because there is no transactional rollback for indexing and the documents are no
longer in the queue for indexing. Remember that when you re-run the indexing task,
Verity Spider can only continue with URLs and documents which are enqueued.
-temp
Syntax: -temp path
Specifies the directory for temporary files (disk cache). By default, the temp directory
is contained within the job directory (optionally specified with the
-jobpath option.
If you do not specify a value for this option, Verity Spider will create a
/spider/temp
directory within the collection. For multiple-collection tasks, the first collection
specified will be used.
Note
Make sure the location you specify contains enough disk space to handle the
documents which are downloaded and held before indexing. The documents are
deleted from the harddisk after they are indexed.
See also -jobpath, for specifying the location of all indexing job directories and files,
one of which is the temp directory.