HP StorageWorks Reference Information Storage System V1.0 User Guide (May 2004)

LO
Chapter 1:
RISS Overview
RISS Concepts
1-4 HP StorageWorks Reference Information Storage System User Guide, April 2004
Understanding Searching and Document Indexing
You can search for any documents archived in your repository (or any other
repositories to which you have access), whether the documents are email
messages or files. When you search for a document, your query is checked
against an index of words that is updated each time a document is archived.
You can use the Document Manager customer option to archive files
manually. For an archived file, the index always includes at least the external
identifying information of the file, such as the file name and last modification
date. This is true for all files, regardless of file type.
With the Document Manager customer option, you can archive any type of
file. However, the system only indexes the contents of email messages and
certain types of files, referred to here as
loose office documents
. The contents
of other files are not indexed (only their external identifying information is
indexed).
Whether or not you have the Document Manager option, email attachments
are indexed similarly to files archived with Document Manager. The contents
of attachments that are loose office documents are indexed. Otherwise, only
the attachment (file) name is added to the index.
Indexing
the contents of a document, whether email message or loose office
document, involves cataloging the document words to prepare them for later
searching. Minor words like “the” are called
stop words
and are not indexed.
Similarly, separators (such as punctuation) between words are ignored
during indexing.
You can search the contents of a document only if the contents have been
indexed, which means you can search the contents of email messages and
loose office documents only. You can search for other kinds of files only by
using external identifying information.
Loose office document files include the following:
Plain text files
HTML (HyperText Markup Language) files
End-user files used by the following Microsoft Office programs: Word,
Excel, PowerPoint, and Access
PDF (Portable Document Format) files viewed with Adobe Acrobat
Reader