User Guide

130 Chapter 9: Indexing Collections with Verity Spider
You can examine the indexing jobs log files for indications that files are being skipped due to
MIME types. For example, a typical ASCII file you might want indexed is a log file
(filename.log). Unless the web server understands that files with .LOG extensions are ASCII text,
of MIME type text/plain, you will see in the indexing job log file that .LOG files are skipped
because of MIME type, even if you use the following:
-mimeinclude ’text/*’
MIME types and file system indexing
When you index a file system, Verity Spider reads filenames and evaluates your MIME type
criteria against an internal, compiled list of known MIME types and associated file extensions.
You cannot edit this list. However, you can use the
-mimemap option to create a custom MIME
type mapping.
When you encounter MIME types being dropped, check whether Verity Spider recognizes that
particular MIME type. For more information, see the table, “Known MIME types for file system
indexing” on page 131.
You can examine the indexing jobs log files for indications that files are being skipped due to
MIME types. For example, a typical ASCII file you might want indexed is a log file
(filename.log). Since Verity Spider does not understand that files with .LOG extensions are
ASCII text, of MIME type text/plain, you will see in the indexing job log file that .LOG files are
skipped because of MIME type, even if you use the following:
-mimeinclude ’text/*’.Setting MIME Types
Indexing unknown MIME types
Whenever you find MIME types being dropped, or you know you will be indexing files whose
extensions are not known to Verity Spider by default, use the
-mimemap option to point to a file
that contains your own custom mappings for file extensions and MIME types.
You can also use the regular expression ’*/*’ for your MIME type criteria; for example:
-mimeinclude ’*/*’
On either platform, you must include single-quotation marks for values that include wildcard
characters.
Also use inclusion and exclusion criteria to finely control what is indexed, as follows:
If your list of file types to index is rather long, use exclusion criteria (-exclude, -indexclude,
-mimeexclude, or -indmimeexclude) to exclude extensions you know you do not want to
index; for example:
-exclude ’*.exe’ ’*.com’
If the list of file types you want to index is relatively small, use inclusion criteria (-include,
-indinclude, -mimeinclude, or -indmimeinclude) to specify them; for example:
-include ’*.txt’ ’*.1st’ ’*.log’.Setting MIME Types