System information

ManualsBrandsMacromedia ManualsOtherCOLDFUSION 4.5-ADMINISTRING COLDFUSION SERVER

121

122

123

124

125

126

127

128

129

130

120

CONFIGURING AND ADMINISTERING COLDFUSION 9

Indexing Collections with Verity Spider

Last updated 2/21/2012

Flow control

When indexing websites, Verity Spider distributes requests to web servers in a round-robin manner. This means that

one URL is fetched from each web server in turn. With flow control, a faster website can finish before a slower one.

The Verity Spider optimizes indexing on every web server.

Verity Spider adjusts the number of connections per server depending on the download bandwidth. When the

download bandwidth from a web server falls below a certain value, Verity Spider automatically scales back the number

of connections to that web server. There is always at least one connection to a web server. When the download

bandwidth increases to an acceptable level, Verity Spider reallocates connections (per the value of the

-connections

option, which is 4 by default). You can turn off flow control with the

-noflowctrl option.

Multi-threading

Verity Spider separates the gathering and indexing jobs into multiple threads for concurrence. Additionally, Verity

Spider can create concurrent connections to web servers for fetching documents, and have concurrent indexing

threads for maximum utilization. This translates to an overall improvement in throughput.

Efficient DNS lookups

Verity Spider minimizes DNS lookups, which means great improvements to lookups throughput. If lookups are

limited by domain or host, then no DNS lookups are made on hosts that fall outside that range. In earlier versions,

DNS lookups were made on all candidate URLs.

Proxy handling efficiency

To allow for greater flexibility when dealing with indexing jobs that involve proxy servers and firewalls, use the

following options:

-noproxy To reduce proxy checking for certain hosts

-proxyauth To authenticate on proxy servers

About Verity Spider syntax

Before you create an indexing task for a new collection, make copies of the relevant default style files to ensure that you

have a set of template style files in a known, stable state.

Running multiple simultaneous Verity Spider jobs can cause performance problems for searches. This does not mean

that you should never run indexing jobs when users might be searching, because your collections are available for

searching even while indexing jobs are running. To optimize performance, try staggering your indexing jobs to avoid

overloading your server.

The Verity Spider command

The vspider executable file, which starts the Verity Spider utility, is located in the platform/bin directory, as follows:

Server and multiserver configuration The vspider.exe (Window) or vspider (UNIX) file is located in

cf_root/verity/k2/platform/bin (server configuration) or jrun_root/verity/k2/platform/bin (multiserver configuration)

where platform is _nti40 for Windows, _solaris for Solaris, or _ilnx21 for Linux.

J2EE configuration The vspider.exe (Window) or vspider (UNIX) file is located in verity_root/k2/platform/bin where

platform is _nti40 for Windows, _solaris for Solaris, or _ilnx21 for Linux.

At its most basic level, a Verity Spider command consists of the following: