System information

142
CONFIGURING AND ADMINISTERING COLDFUSION 9
Indexing Collections with Verity Spider
Last updated 2/21/2012
Example 2
To avoid indexing directory listing pages, while still parsing the document and path links except for the link to the
parent directory, use one of the following, depending on the web server being indexed:
1 For Netscape web servers, use the following:
-indskip title "*Index of*"
-nofollow "*parent directory*"
2 For Microsoft Internet Information Server, use the following:
-indskip a "*to parent directory*"
-nofollow "*parent directory*"
-maxdocsize
Syntax
-maxdocsize integer
Specifies the maximum size, in kilobytes, for documents to be indexed. Any documents larger than the value specified
by the
-maxdocsize option are ignored.
The default is to index documents of any size.
-metafile
Type
Web crawling only
Syntax
-metafile path_and_filename
Lets you use a text file to map custom meta tags to valid HTTP header fields. If you use backslashes, double them so
that they are properly escaped; for example:
C:\\test\docs\path
This means that you can use your own meta tag, in the document, to replace what is returned by the web server, or to
insert it if nothing is returned. Currently, the only header fields of real value are "Last-Modified" and "Content-
Length." Future enhancements could allow for greater variety.
The following is the syntax for entries in the text file:
name Last-Modified y|n
or
name Content-Length y|n
Where y|n is an override flag, which can be yes or no.
Example
A mapping file for the -metafile option might include the following:
Doc_Last_Touched Last-Modified n
Doc_Size Content-Length y