User Guide
Table Of Contents
- Contents
- Introduction
- Administering ColdFusion MX 7
- Administering ColdFusion MX
- Using the ColdFusion MX Administrator
- Contents
- Initial administration tasks
- Accessing user assistance
- Server Settings section
- Data & Services section
- Debugging & Logging section
- Extensions section
- Event Gateways section
- Security section
- Packaging and Deployment section
- Enterprise Manager section
- Custom Extensions section
- Administrator API
- Data Source Management
- Contents
- About JDBC
- Adding data sources
- Connecting to DB2 Universal Database
- Connecting to Informix
- Connecting to Microsoft Access
- Connecting to Microsoft Access with Unicode
- Connecting to Microsoft SQL Server
- Connecting to MySQL
- Connecting to ODBC Socket
- Connecting to Oracle
- Connecting to other data sources
- Connecting to Sybase
- Connecting to JNDI data sources
- Web Server Management
- Deploying ColdFusion Applications
- Administering Security
- Using Multiple Server Instances
- Administering Verity
- Introducing Verity and Verity Tools
- Indexing Collections with Verity Spider
- Using Verity Utilities
- Contents
- Overview of Verity utilities
- Using the mkvdk utility
- Using the rck2 utility
- Using the rcvdk utility
- Using the didump utility
- Using the browse utility
- Using the merge utility
- Index

126 Chapter 9: Indexing Collections with Verity Spider
-norobo
Type: Web crawling only
Specifies to ignore any robots.txt files encountered. The robots.txt file is used on many websites to
specify what parts of the site indexers should avoid. The default is to honor any robots.txt files.
If you are re-indexing a site and the robots.txt file has changed, Verity Spider deletes documents
that have been newly disallowed by the robots.txt file.
Use this option with discretion and extreme care, especially in conjunction with the
-cgiok
option.
See also
-nodocrobo.
-pathlen
Syntax:
-pathlen num_pathsegments
Limits indexing to the specified number of path segments in the URL or file system path. The
path length is determined as follows:
• The host name and drive letter are not included; for example, neither www.spider.com:80/ nor
C:\ would be included in determining the path length.
• All elements following the host name are included.
• The actual filename, if present, is included; for example, /world.html would be included in
determining the path length.
• Any directory paths between the host and the actual filename are included.
Example
For the following URL, the path length would be four:
http://www.spider:80/comics/fun/funny/world.html
<-1-> <2> <-3-> <---4--->
For the following file system path, the path length would be three:
C:\files\docs\datasheets
<-1-><-2-><---3--->
The default value is 100 path segments.
-refreshtime
Syntax:
-refreshtime timeunits
Specifies not to refresh any documents that have been indexed since the timeunits value began.
The following is the syntax for timeunits:
n day n hour n min n sec
Where n is a positive integer. You must include spaces, and since the first three letters of each time
unit are parsed, you can use the singular or plural form of the word.