User Guide
174 Chapter 8 Verity Spider
You cannot use the question mark ( ? ) wildcard, and the -regexp option does not
allow you to use regular expressions.
Use -indmimeexclude to allow the Verity Spider to follow documents, without
indexing them, to gain access to other desirable document types.
-mimeinclude
Syntax: -mimeinclude mime_1 [mime_n] ...
Specifies MIME types to be included.
On Windows NT, you should include double quotes around the argument to protect
the special characters such as (*). On UNIX, you should use single quotes. Note that
this is only required when you run the indexing job from a command line. Quotes are
not necessary within a command file (-cmdfile).
The default is to include all MIME types. For the mime variable, you can include the
asterisk ( * ) wildcard for text strings. For example:
’text/*’
You cannot use the question mark ( ? ) wildcard, and the -regexp option does not
allow you to use regular expressions.
-mindocsize
Syntax: -mindocsize integer
Specifies the minimum size, in kilobytes, for documents to be indexed. Any
documents smaller than the value specified by mindocsize will be ignored.
The default is to index documents of any sizes.
-skip
Syntax: -skip HTML_tag "exp"
Typ e: Web crawling only
Specifies Verity Spider is to not index any HTML document which contains the text of
exp within the given HTML_tag. For multiple HTML_tag and exp combinations, use
multiple instances of the -skip option.
You can use wildcard expressions, where the asterisk ( * ) is for text strings and the
question mark ( ? ) is for single characters. For example:
’/my_doc*/year199?’
On Windows NT, you should include double quotes around the argument to protect
the special characters such as (*). On UNIX, you should use single quotes. Note that
this is only required when you run the indexing job from a command line. Quotes are
not necessary within a command file (-cmdfile).