User Guide

ManualsBrandsMacromedia ManualsOtherCOLDFUSION 5-ADVANCED ADMINISTRATION

Paths and URLs Options 165

-nodocrobo

Specifies ROBOT META tag directives are to be ignored.

In HTML 3.0 and earlier, robot directives could only be given as the file robots.txt

under the root directory of a Web site. In HTML 4.0, every document can have robot

directives embedded in the META field. Use this option to ignore them. This option

should, of course, be used with discretion.

See Also -norobo and

http://www.w3c.org/TR/REC-html40/html40.txt.

-nofollow

Syntax: -nofollow "exp"

Typ e: Web crawling only.

Specifies Verity Spider cannot follow any URLs which match the expression exp. If

you do not specify a exp value for -nofollow, then Verity Spider assumes a value of "*"

where no documents are followed.

You can use wildcard expressions, where the asterisk ( * ) is for text strings and the

question mark ( ? ) is for single characters. You should always encapsulate the exp

values in double quotes to ensure they are properly interpreted.

If you use backslashes, you must double them so they are properly escaped. For

example:

C:\\test\\docs\\path

To use regular expressions, also specify the -regexp option.

Previous versions of the Verity Spider did not allow the use of an expression. This

meant that for each starting point URL, only the first document would be indexed.

With the addition of the expression functionality, you can now selectively skip URLs

even within documents.

See also -regexp

-norobo

Type: Web crawling only.

Specifies that any

robots.txt files encountered are ignored. The robots.txt file is

used on many Web sites to specify what parts of the site indexers should avoid. The

default is to honor any

robots.txt files.

If you are re-indexing a site and

robots.txt has changed, the Verity Spider will

delete documents that have been newly disallowed by

robots.txt.

This option should, of course, be used with discretion and extreme care, especially in

conjunction with -cgiok.

See Also -nodocrobo and

http://info.webcrawler.com/mak/projects/robots/

norobots.html

.