System information
126
CONFIGURING AND ADMINISTERING COLDFUSION 9
Indexing Collections with Verity Spider
Last updated 2/21/2012
-nodupdetect
Type
Web crawling only
Disables checksum-based detection of duplicates when indexing websites. URL-based duplicate detection is still
performed.
By default, a document checksum is computed based on the CRC-32 algorithm. The checksum combined with the
document size is used to determine if the document is a duplicate.
See also
“-followdup” on page 134.
-noindex
Specifies that Verity Spider gathers document locations without indexing them. The document locations are stored in
a bulk insert file (BIF), which is then submitted to the collection. This option is typically used with a separate indexing
process, such as
mkvdk or collection servicers (collsvc). The BIF is processed by the next indexing process run for the
collection, whether it is Verity Spider, mkvdk, or collection servicers (
collsvc).
Do not try to start Verity Spider and another process at the same time. Allow Verity Spider time to generate enough
work for the secondary indexing process. If you are using
mkvdk, you can run it in persistent mode to ensure that it
acts upon work generated by Verity Spider.
Note: When you execute an indexing job for a collection and you use the -noindex option, the persistent store for the
collection is not updated.
See also
“-nocache” on page 125 and “-nosubmit” on page 126.
For more information on the mkvdk utility, see “Using the mkvdk utility” on page 150.
-nosubmit
Specifies that Verity Spider gathers document locations without submitting them. The document locations are stored
in a bulk insert file (BIF), which is not submitted to the collection. This option is typically used with a separate indexing
process, such as
mkvdk or collection servicers (collsvc). You can also use Verity Spider again with the -processbif
option. With an indexing process other than Verity Spider, you must specify the name and path for the BIF, because
the collection has no record of it.
-persist
Syntax
-persist num_seconds
Enables the Verity Spider to run in persistent mode, checking for updates every num_seconds seconds until it is
stopped.
While Verity Spider is running in persistent mode, there is no optimization. After Verity Spider is taken out of
persistent mode, you need to perform optimization on the collection. For more information about using the mkvdk
utility, see
“Using the mkvdk utility” on page 150.