User Guide
108 Chapter 9: Indexing Collections with Verity Spider
-maxindmem
Syntax:
-maxindmem kilobytes
Specifies the maximum amount of memory, in kilobytes, used by each indexing thread. Specify
the number of threads with the
-indexers option.
By default, each indexing thread uses as much memory as is available from the system.
-maxnumdoc
Syntax:
-maxnumdoc num_docs
Specifies the maximum number of documents to download or submit for indexing. The value for
num_docs does not necessarily correspond to the number of documents indexed. The following
factors affect the actual number:
• Whether the value of num_docs falls within a block of documents dictated by the
-submitsize option. If it does, the entire block of documents must be processed.
• Whether documents retrieved are actually indexed, because they are invalid or corrupt.
-mimemap
Syntax:
-mimemap path_and_filename
Specifies a control file (simple ASCII text) that maps file extensions to MIME-types. This lets you
make custom associations and override defaults.
The following is the format for the control file:
#file_ext_no_dot mime-type
abc application/word
-nocache
Ty pe: Web crawling only
Used with the
-noindex or -nosubmit options, this option disables the caching of files during
website indexing. This has the effect of decreasing the demands on your disk space.
Normally, Verity Spider downloads URLs, then writes them to a bulk insert file and downloads
the documents themselves. When indexing occurs, once the
-submitsize option has been
reached, the cached files are indexed and then deleted. If you use the
-noindex option, the bulk
insert file is submitted but not processed by Verity Spider, and so the documents are not deleted
until indexing occurs. This will usually be
mkvdk or collsvc, or you can use Verity Spider again
with the
-processbif option.
By using the
-nocache option in conjunction with the -noindex or -nosubmit option, you
avoid storing files locally. Files are downloaded only when indexing actually occurs.
See also
-noindex.