User Guide

156 Chapter 8 Verity Spider
Note
You should not run more than one Verity Spider process in persistent mode. As the
Verity Spider is a resource intensive process, you should only run it in persistent
mode with an interval of less than one day. For time intervals greater than twelve
hours, you should use some form of scheduling. Some examples are cron jobs for
UNIX, and the AT command for Windows NT Server.
-preferred
Syntax: -preferred exp_1 [exp_n] ...
Typ e: Web crawling only
Specifies a list of hosts or domains which are to be preferred when retrieving
documents for viewing. You can use wildcard expressions, where the asterisk ( * ) is
for text strings and the question mark ( ? ) is for single characters. To use regular
expressions, also specify the -regexp option. Use this option when you leave
duplicate detection enabled and do not specify
-nodupdetect.
When indexing, you may encounter a non-preferred host first. In that case,
documents are parsed and followed and stored as candidates. When duplicates are
encountered on another server, which is preferred, the duplicate documents from
the non-preferred server are skipped. When documents are requested for viewing,
they will be retrieved from the preferred server.
On Windows NT, you should include double quotes around the argument to protect
the special characters such as (*). On UNIX, you should use single quotes. Note that
this is only required when you run the indexing job from a command line. Quotes are
not necessary within a command file (
-cmdfile).
See Also
-regexp
-prefixmap
Syntax: -prefixmap path_and_filename
Typ e: File system only
Specifies a control file (simple ASCII text) that maps file system paths to Web aliases.
In conjunction with
-abspath, this option is typically used to create an URL field that
is the Web equivalent of a file system path. File system indexing is faster than Web
crawling over the network. If you use
-prefixmap to replace the file system path with
the Web URL, relative hyperlinks in the HTML pages are kept intact when viewed
through Information Server.
The format for the control file is:
src_field src_prefix dest_field dest_prefix
If you use backslashes, you must double them so they are properly escaped. For
example:
C:\\test\\docs\\path