User Guide
120 Chapter 9: Indexing Collections with Verity Spider
-include
Specifies that only those files, paths, and URLs that match the specified expression or expressions
will be followed. If you use backslashes, you must double them so that they are properly escaped;
for example:
C:\\test\\docs\\path
You can use wildcard expressions, where the asterisk (*) is for text strings and the question mark
(?) is for single characters; for example:
'/my_doc*/year199?'
In Windows, include double-quotation marks around the argument to protect the special
characters, such as the asterisk (*). On UNIX, use single-quotation marks. This is only required
when you run the indexing job from a command line. Quotation marks are not necessary within
a command file (the
-cmdfile option).
To use regular expressions, also specify the
-regexp option.
If your starting points do not contain the specified
-include expressions, nothing will be
indexed. The
-include option prevents Verity Spider from even following anything that does not
match the specified expressions. You might want to use the
-indinclude option instead. Where
the
-include option prevents Verity Spider from even following anything that does not match
the specified expressions, the
-indinclude option allows Verity Spider to follow what matches
the specified expressions, while not indexing.
For document types, use the
-mimeinclude option instead; for example, specify -mimeinclude
text/html
rather than -include *.htm.
Note: When specifying a URL, you must use full, absolute paths using the same format that appears
in the HTML hyperlink. If the link is relative, you must change it to absolute to use it with the
-include
option.
See also -regexp.
-indexclude
Syntax:
-indexclude exp_1 [exp_n] ...
Specifies that the files and paths in URLs that match the expressions are not indexed. They are,
however, still followed. If you use backslashes, you must double them so that they are properly
escaped; for example:
C:\\test\\docs\\path
You can use wildcard expressions, where the asterisk (*) is for text strings and the question mark
(?) is for single characters; for example:
'/my_doc*/year199?'
In Windows, include double-quotation marks around the argument to protect the special
characters, such as the asterisk (*). On UNIX, use single-quotation marks. This is only required
when you run the indexing job from a command line. Quotation marks are not necessary within
a command file (the
-cmdfile option).
To use regular expressions, also specify the
-regexp option.
You would use this option to gather some documents, such as HTML tables of contents, to gain
access to other documents for indexing.