System information

137
CONFIGURING AND ADMINISTERING COLDFUSION 9
Indexing Collections with Verity Spider
Last updated 2/21/2012
-reparse
Type
Web crawling only
Forces parsing of all HTML documents already in the collection. Specify a starting point with the -start option when
you use the
-reparse option.
You can use the -reparse option when you want to include paths and documents that were previously skipped due
to exclusion or inclusion criteria. In Verity Spider, make sure that you change the criteria while using the
-cmdfile
option.
-unlimited
Specifies that no limits are placed on Verity Spider if the -host or the -domain option is not specified. The default is
to limit based on the host of the first starting point listed.
-virtualhost
Syntax
-virtualhost name_1 [name_n] ...
Specifies that DNS lookups are avoided for the hosts listed. Use only complete text strings for hosts. You cannot use
wildcard expressions. This lets you index by alias, such as when multiple web servers are running on the same host.
You can use regular expressions.
Normally, when Verity Spider resolves host names, it uses DNS lookups to convert the names to canonical names, of
which there can be only one per computer. This allows for the detection of duplicate documents, to prevent results
from being diluted. For multiple aliased hosts, however, duplication is not a barrier as documents can be referred to
by more than one alias and yet remain distinct because of the different alias names.
Example
You can have both marketing.verity.com and sales.verity.com running on the same host. Each alias has a different
document root, although document names such as index.htm can occur for both. With the
-virtualhost option,
both server aliases can be indexed as distinct sites. Without the
-virtualhost option, they would both be resolved to
the same host name, and only the first document encountered from any duplicate pair would be indexed.
Note: If you are using Netscape Enterprise Server, and you have specified only the host name as a virtual host, Verity
Spider cannot index the virtual host site. This is because Verity Spider always adds the domain name to the document key.
Content options
The Verity Spider content options are:
-casesen
Makes processing case sensitive by specifying that the spider separately process keys that differ only in case. Use only
for indexing UNIX servers.