System information

127
CONFIGURING AND ADMINISTERING COLDFUSION 9
Indexing Collections with Verity Spider
Last updated 2/21/2012
Note: Do not run more than one Verity Spider process in persistent mode. As the Verity Spider is a resource-intensive
process, only run it in persistent mode with an interval of less than one day. For time intervals greater than 12 hours, use
some form of scheduling. Some examples are cron jobs for UNIX, and the AT command for Windows server.
-preferred
Type
Web crawling only
Syntax
-preferred exp_1 [exp_n] ...
Specifies a list of hosts or domains that are preferred when retrieving documents for viewing. You can use wildcard
expressions, where the asterisk (*) is for text strings and the question mark (?) is for single characters. To use regular
expressions, also specify the
-regexp option. Use this option when you leave duplicate detection enabled and do not
specify the
-nodupdetect option.
When indexing, you might encounter a nonpreferred host first. In that case, documents are parsed and followed and
stored as candidates. When duplicates are encountered on another server, which is preferred, the duplicate documents
from the nonpreferred server are skipped. When documents are requested for viewing, they are retrieved from the
preferred server.
In Windows, include double-quotation marks around the argument to protect the special characters, such as the
asterisk (*). On UNIX, use single-quotation marks. This is only required when you run the indexing job from a
command line. Quotation marks are not necessary within a command file (the
-cmdfile option).
See also
-regexp” on page 128.
-prefixmap
Syntax
-prefixmap path_and_filename
Specifies a control file (simple ASCII text) that maps file system paths to web aliases.
With the -abspath option, this option is typically used to create a URL field that is the web equivalent of a file system
path. File system indexing is faster than web crawling over the network. If you use the
-prefixmap option to replace
the file system path with the web URL, relative hypertext links in the HTML pages are kept intact when returned in
Verity search results.
The following is the format for the control file:
src_field src_prefix dest_field dest_prefix
If you use backslashes, you must double them so that they are properly escaped; for example:
C:\\test\docs\path
For example, to map the filepath /usr/pub/docs to http://web/~verity, use the following:
vdkvgwkey /usr/pub URL http://web/~verity