HP StorageWorks Reference Information Storage System V1.0 User Guide (May 2004)
LO
Chapter 5:
Query Syntax and Matching
Query Expression Syntax and Matching
5-6 HP StorageWorks Reference Information Storage System User Guide, April 2004
For information on Unicode 2.0, refer to the following web site:
•
http://www.unicode.org
For information on ISO 8859-1, refer to the following web sites:
•
http://wwwwbs.cs.tu-berlin.de/user/czyborra/charsets/
•
http://www.iso.ch/
•
http://www.microsoft.com/globaldev/reference/iso/28591.htm
Letters and Digits in Files
Although all letters and digits are word characters, their treatment in files
(including email message attachments) depends on the character encoding
used. You can search for any words in email message bodies and headers,
regardless of the encoding.
You cannot search for words in files (including email message attachments),
unless the character encoding is ISO 8859–1 (Latin–1).
This applies only to the contents of files. External information identifying
files, such as filenames, is treated the same as message header and body
information.
Caveats Concerning Non-English Words
The following caveats apply to non-English words:
•
The stop words are English only. This means, for example, that you
cannot search for the French word
but
, meaning goal, because of the
(English) stop word
but
.
•
Because separators are defined as non-word characters, the determina-
tion of searchable words is not always appropriate for non-Western
languages.
For example, a series of Chinese letters (as defined by the Unicode
standard) with no intervening separators (as defined by RISS) is parsed
as a single word for purposes of indexing and querying. Semantically, a
single Chinese letter often corresponds to a conceptual word, but it is
not parsed as a word.