User Guide

160 Chapter 8 Verity Spider
For example, previous versions of Verity Spider did not support the "Host" header,
which is needed for Virtual Host indexing. Also, a "Proxy-authentication" header was
needed to pass a username and password to a proxy server.
In Verity Spider V3.7, the "Host" header is supported by default, and the -proxyauth
option is available for proxy server authentication. Therefore the
-header option is
maintained only for backwards compatibility and possible future enhancements.
Note
Misuse of this option will cause spider failure. In the event that this happens, re-run
the indexing task with modified -header values.
-hostcache
Syntax: -hostcache num_hostnames
Specifies the number of hostnames to cache to avoid DNS lookups. Without this
option, the host cache will continue to grow.
The default value is 256.
-noflowctrl
Typ e: Web crawling only.
Disables round-robin indexing of Web sites with network flow control.
By default, Verity Spider uses round-robin indexing of Web sites to avoid
overwhelming a Web server and to improve indexing performance. Verity Spider
connects to each Web server in a round-robin manner, using up to the value for
-connections. This means one URL is fetched from each Web server in turn.
Note
Using -noflowctrl may result in a significant drop in performance.
-noproxy
Syntax: -noproxy name_1 [name_n] ...
Typ e: Web crawling only.
Used in conjunction with -proxy, -noproxy specifies that the Verity Spider directly
access the hosts whose names match those specified. By default, when -proxy is
specified, the Verity Spider first tries to access every host with the proxy information.
To improve performance, use -noproxy for those hosts you know can be accessed
without a proxy host. For the name variable, you can use the asterisk ( * ) wildcard for
text strings. For example:
’*.verity.com’
You cannot use the question mark ( ? ) wildcard, and the -regexp option does not
allow you to use regular expressions.