User Guide
Networking options 113
-header
Ty pe: Web crawling only
Syntax:
-header string
Specifies an HTTP header to add to the spidering request; for example:
-header "Referer: http://www.verity.com/"
Verity Spider sends some predefined headers, such as Accept and User-Agent, by default. Special
headers are sometimes necessary to correctly index a site.
For example, earlier versions of Verity Spider did not support the Host header, which is needed
for Virtual Host indexing. Also, a Proxy-authentication header was needed to pass a username
and password to a proxy server.
In Verity Spider V3.7, the Host header is supported by default, and the
-proxyauth option is
available for proxy server authentication. Therefore, the
-header option is maintained only for
backwards compatibility and possible future enhancements.
Note: Misuse of this option causes spider failure. If this happens, rerun the indexing task with
modified
-header values.
-hostcache
Syntax:
-hostcache num_hostnames
Specifies the number of host names to cache to avoid DNS lookups. Without this option, the
host cache continues to grow.
The default value is 256.
-noflowctrl
Ty pe: Web crawling only
Disables round-robin indexing of websites with network flow control.
By default, Verity Spider uses round-robin indexing of websites to avoid overwhelming a web
server and to improve indexing performance. Verity Spider connects to each web server in a
round-robin manner, using up to the value for the
-connections option. This means that one
URL is fetched from each web server, in turn.
Note: Using the -noflowctrl option can result in a significant drop in performance.
-noproxy
Ty pe: Web crawling only
Syntax:
-noproxy name_1 [name_n] ...
Used in conjunction with the -proxy option, the -noproxy option specifies that Verity Spider
directly access the hosts whose names match those specified. By default, when you specify the
-proxy option, Verity Spider first tries to access every host with the proxy information. To
improve performance, use the
-noproxy option for the hosts you know can be accessed without a
proxy host. For the name variable, you can use the asterisk (*) wildcard for text strings; for
example:
'*.verity.com'