User Guide
Table Of Contents
- Contents
- Introduction
- Administering ColdFusion MX 7
- Administering ColdFusion MX
- Using the ColdFusion MX Administrator
- Contents
- Initial administration tasks
- Accessing user assistance
- Server Settings section
- Data & Services section
- Debugging & Logging section
- Extensions section
- Event Gateways section
- Security section
- Packaging and Deployment section
- Enterprise Manager section
- Custom Extensions section
- Administrator API
- Data Source Management
- Contents
- About JDBC
- Adding data sources
- Connecting to DB2 Universal Database
- Connecting to Informix
- Connecting to Microsoft Access
- Connecting to Microsoft Access with Unicode
- Connecting to Microsoft SQL Server
- Connecting to MySQL
- Connecting to ODBC Socket
- Connecting to Oracle
- Connecting to other data sources
- Connecting to Sybase
- Connecting to JNDI data sources
- Web Server Management
- Deploying ColdFusion Applications
- Administering Security
- Using Multiple Server Instances
- Administering Verity
- Introducing Verity and Verity Tools
- Indexing Collections with Verity Spider
- Using Verity Utilities
- Contents
- Overview of Verity utilities
- Using the mkvdk utility
- Using the rck2 utility
- Using the rcvdk utility
- Using the didump utility
- Using the browse utility
- Using the merge utility
- Index

About Verity Spider syntax 113
If an indexing task halts, you can rerun the task as-is. The persistent store for the specified
collection is read, and only those candidate URLs that are in the queue but not yet processed are
parsed. Candidate URLs correspond to URLs of the following status, as reported by vsdb:
cand, used, inse, upda, dele, fail
Note: By using the
-start option with the -refresh option, you provide a starting point for Verity
Spider and therefore do not need to use at least one of the following options:
-host, -domain,
-nofollow, or -unlimited.
-refresh
Used for updating a collection, specifies that Verity Spider process only those documents that
qualify, as follows:
• They are new documents in the repository, and they qualify for indexing under the criteria.
• They exist in the collection and are recorded in the Verity Spider persistent store with a status
of done. If Verity Spider determines that these indexed documents have been updated in the
repository, then they are retrieved again to be reparsed and reindexed. The document
VdkVgwKey values do not change.
• They are deleted in the collection. If Verity Spider determines that documents have been
deleted from the repository, then they are also deleted from the persistent store and the
collection. The exception to this rule is when you use the
-nooptimize option with the
-refresh option. In this case, any document deleted from the repository is marked for
deletion in the collection. It will be removed from the collection and the persistent store when
the next indexing task is run for the collection.
When you rerun an existing indexing job, Verity Spider automatically refreshes the collection. If
you add or remove any of the starting points, however, you must manually specify the
-refresh
option to refresh existing documents.
Note: You can also use the -start option to provide a starting point for Verity Spider. If you do not
use the
-start option, use at least one of the following options: -host, -domain, or -nofollow. For
further control, also see the
-refreshtime option. If you do not use any constraint criteria, Verity
Spider operates without limits and will likely index far more than you intended.
Repository type Starting point
Web The URL or URLs from which Verity Spider is to begin indexing. Use other
options, such as the
-jumps option, to control how far from the starting point
Verity Spider goes.
File The starting directory or directories in which Verity Spider will start indexing. All
subdirectories beneath the starting point will be indexed, unless you use the
-pathlen option or any of the inclusion or exclusion criteria.