User Guide
124 Chapter 9: Indexing Collections with Verity Spider
-metafile
Ty pe: Web crawling only
Syntax:
-metafile path_and_filename
Allows you to use a text file to map custom meta tags to valid HTTP header fields. If you use
backslashes, you must double them so that they are properly escaped; for example:
C:\\test\\docs\\path
This means that you can use your own meta tag, in the document, to replace what is returned by
the web server, or to insert it if nothing is returned. Currently, the only header fields of real value
are "Last-Modified" and "Content-Length." Future enhancements could allow for greater variety.
The following is the syntax for entries in the text file:
name Last-Modified y|n
or
name Content-Length y|n
Where y|n is an override flag, which can be yes or no.
Example
A mapping file for the -metafile option might include the following:
Doc_Last_Touched Last-Modified n
Doc_Size Content-Length y
If you use the y override flag, the value for the custom meta tag overrides the value for the valid
field, even if both values are present and differ. This can be useful when the valid field value is
always sent, but you want to specify your own value with a custom meta tag.
If you use the n override flag, the value for the custom meta tag is used only if there is no value for
the valid field returned by the server. If a value for the valid field exists, it is given precedence.
Note: If you have several entries mapping to the same valid field, only the last entry takes effect.
-mimeexclude
Syntax:
-mimeexclude mime_1 [mime_n] ...
Specifies MIME types that are neither followed nor indexed.
In Windows, include double-quotation marks around the argument to protect the special
characters, such as the asterisk (*). On UNIX, use single-quotation marks. This is only required
when you run the indexing job from a command line. Quotation marks are not necessary within
a command file (the
-cmdfile option).
The default is to include all MIME types. For the mime variable, you can include the asterisk (*)
wildcard for text strings; for example:
'text/*'
You cannot use the question mark (?) wildcard, and the -regexp option does not let you use
regular expressions.
Use the
-indmimeexclude option to allow Verity Spider to follow documents, without indexing
them, to gain access to other desirable document types.