System information
122
CONFIGURING AND ADMINISTERING COLDFUSION 9
Indexing Collections with Verity Spider
Last updated 2/21/2012
Note: By using the -start option with the -refresh option, you provide a starting point for Verity Spider and therefore
do not need to use at least one of the following options:
-host, -domain, -nofollow, or -unlimited.
-refresh
Used for updating a collection, specifies that Verity Spider process only those documents that qualify, as follows:
• They are new documents in the repository, and they qualify for indexing under the criteria.
• They exist in the collection and are recorded in the Verity Spider persistent store with a status of done. If Verity
Spider determines that these indexed documents have been updated in the repository, then they are retrieved again
to be reparsed and reindexed. The document VdkVgwKey values do not change.
• They are deleted in the collection. If Verity Spider determines that documents have been deleted from the
repository, then they are also deleted from the persistent store and the collection. The exception to this rule is when
you use the
-nooptimize option with the -refresh option. In this case, any document deleted from the repository
is marked for deletion in the collection. It is removed from the collection and the persistent store when the next
indexing task is run for the collection.
When you rerun an existing indexing job, Verity Spider automatically refreshes the collection. If you add or remove
any of the starting points, however, you must manually specify the
-refresh option to refresh existing documents.
Note: You can also use the -start option to provide a starting point for Verity Spider. If you do not use the -start
option, use at least one of the following options:
-host, -domain, or -nofollow. For further control, also see the -
refreshtime option. If you do not use any constraint criteria, Verity Spider operates without limits and indexes far more
than you intended.
Core options
Following are the Verity Spider core options:
-cmdfile
Syntax
-cmdfile path_and_filename
Specifies that Verity Spider reads command-line syntax from a file, in addition to the options passed in the command
line. This option includes the path to the file that contains the command-line syntax. The
-cmdfile option
circumvents command-line length limits.
The syntax for the command-file is:
option optional_parameters
For better readability, place each option and any parameters on a single line. Verity Spider can properly parse the lines.
Repository type Starting point
Web The URL or URLs from which Verity Spider is to begin indexing. Use other options, such as the -jumps option,
to control how far from the starting point Verity Spider goes.
File The starting directory or directories in which Verity Spider start indexing. All subdirectories beneath the
starting point are indexed, unless you use the
-pathlen option or any of the inclusion or exclusion criteria.