System information

119
Last updated 2/21/2012
Chapter 12: Indexing Collections with
Verity Spider
Use the Verity Spider utility to index documents on your website and build collections that are searchable by the user.
About Verity Spider
Verity Spider enables you to index web-based and file system documents throughout your enterprise. Verity Spider
lets you index more than two hundred application document formats, including Microsoft Office, WordPerfect, ASCII
text, HTML, SGML, XML, and PDF (Adobe Acrobat) documents.
Another advantage of this method, is that the index that the vspider command creates includes dynamic content.
Using the
cfindex tag and indexing a collection through the ColdFusion Administrator do not include dynamic
content.
The Verity Spider that is included with ColdFusion is licensed for websites that are defined and reside on the same host
on which ColdFusion is installed. Contact Verity Sales for licensing options regarding the use of Verity Spider for
external websites.
Web standard support
Verity Spider supports key web standards used by Internet and intranet sites. Standard href links and frames pointers
are recognized, so that navigation through them is supported. Redirected pages are followed so that the real underlying
document is indexed. Verity Spider adheres to the robots exclusion standard specified in robots.txt files, so that
administrators can maintain friendly visits to remote websites. Http Basic Authentication mechanism is supported so
that password-protected sites can be indexed.
Restart capability
When an indexing job fails, or for some reason Verity Spider cannot index a significant number or type of URLs, you
can restart the indexing job to update the collection. Only those URLs that were not successfully indexed previously
are processed.
State maintenance through a persistent store
Verity Spider stores the state of gathered and indexed URLs in a persistent store, which lets it track progress for the
purposes of gracefully and efficiently restarting stopped indexing jobs.
Performance
Verity Spider performance is greatly improved over previous versions, because of low memory requirements, flow
control, and the help of multi-threading and efficient Domain Name System (DNS) lookups.