User Guide

Paths and URLs Options 167
-reparse
Type: Web crawling only.
Forces parsing of all HTML documents already in the collection. You must specify a
starting point with the -start option when you use -reparse.
You can use -reparse when you want to include paths and documents which were
previously skipped due to exclusion or inclusion criteria. Remember to change the
criteria, else there will be little for the Verity Spider to do. This can be easy to overlook
when you are using
-cmdfile.
-unlimited
Specifies no limits to be placed on Verity Spider if neither -host nor -domain is
specified. The default is to limit based on the host of the first starting point listed.
-virtualhost
Syntax: -virtualhost name_1 [name_n] ...
Specifies that DNS lookups are avoided for the hosts listed. You must use only
complete text strings for hosts. You may not use wildcard expressions. This allows
you to index by alias, such as when multiple Web servers are running on the same
host. You can use regular expressions.
Normally, when Verity Spider resolves host names, it uses DNS lookups to convert
the names to canonical names, of which there can be only one per machine. This
allows for the detection of duplicate documents, to prevent results from being
diluted. In the case of multiple aliased hosts, however, duplication is not a barrier as
documents can be referred to by more than one alias, and yet remain distinct
because of the different alias names.
Example
You may have both marketing.verity.com and sales.verity.com running on the same
host. Each alias has a different document root, although document names such as
index.htm may occur for both. With
-virtualhost, both server aliases can be
indexed as distinct sites. Without
-virtualhost, they would both be resolved to the
same host name and only the first document encountered from any duplicate pair
would be indexed.
Warning! If you are using Netscape Enterprise Server, and you have specified only the
host name as a virtual host, then Verity Spider will not be able to index the virtual
host site. This is because the Verity Spider always adds the domain name to the
document key.