User Guide
Table Of Contents
- Contents
- Introduction
- Administering ColdFusion MX 7
- Administering ColdFusion MX
- Using the ColdFusion MX Administrator
- Contents
- Initial administration tasks
- Accessing user assistance
- Server Settings section
- Data & Services section
- Debugging & Logging section
- Extensions section
- Event Gateways section
- Security section
- Packaging and Deployment section
- Enterprise Manager section
- Custom Extensions section
- Administrator API
- Data Source Management
- Contents
- About JDBC
- Adding data sources
- Connecting to DB2 Universal Database
- Connecting to Informix
- Connecting to Microsoft Access
- Connecting to Microsoft Access with Unicode
- Connecting to Microsoft SQL Server
- Connecting to MySQL
- Connecting to ODBC Socket
- Connecting to Oracle
- Connecting to other data sources
- Connecting to Sybase
- Connecting to JNDI data sources
- Web Server Management
- Deploying ColdFusion Applications
- Administering Security
- Using Multiple Server Instances
- Administering Verity
- Introducing Verity and Verity Tools
- Indexing Collections with Verity Spider
- Using Verity Utilities
- Contents
- Overview of Verity utilities
- Using the mkvdk utility
- Using the rck2 utility
- Using the rcvdk utility
- Using the didump utility
- Using the browse utility
- Using the merge utility
- Index

132 Chapter 9: Indexing Collections with Verity Spider
Specifies that Verity Spider follow and parse links, but not index, any HTML document that
contains the text of exp within the given HTML_tag. For multiple HTML_tag and exp
combinations, use multiple instances of the
-skip option.
You can use wildcard expressions, where the asterisk (*) is for text strings and the question mark
(?) is for single characters; for example:
'/my_doc*/year199?'
In Windows, include double-quotation marks around the argument to protect the special
characters, such as the asterisk (*). On UNIX, use single-quotation marks. This is only required
when you run the indexing job from a command line. Quotation marks are not necessary within
a command file (the
-cmdfile option).
If you use backslashes, you must double them so that they are properly escaped; for example:
C:\\test\\docs\\path
To use regular expressions, also specify the -regexp option.
Example 1
To skip all HTML documents that contain the word "personnel" in the Title element, while still
parsing those documents for links to other documents, use the following:
-indskip title "personnel"
Example 2
To avoid indexing directory listing pages, while still parsing the document and path links except
for the link to the parent directory, use one of the following, depending on the web server being
indexed:
• For Netscape web servers, use the following:
-indskip title "*Index of*"
-nofollow "*parent directory*"
• For Microsoft Internet Information Server, use the following:
-indskip a "*to parent directory*"
-nofollow "*parent directory*"
-maxdocsize
Syntax:
-maxdocsize integer
Specifies the maximum size, in kilobytes, for documents to be indexed. Any documents larger
than the value specified by the
-maxdocsize option are ignored.
The default is to index documents of any size.
-metafile
Typ e : Web crawling only
Syntax:
-metafile path_and_filename