HP StorageWorks Reference Information Storage System V1.0 User Guide (May 2004)
LO
Chapter 5:
Query Syntax and Matching
Query Expression Syntax and Matching
5-4 HP StorageWorks Reference Information Storage System User Guide, April 2004
Learning the rules of creating query words thus means learning also the rules
of document indexing and therefore just what words you can search for.
For example, knowing that the apostrophe character ( ’ ) is a separator means
knowing that you cannot search for the English word “won’t” using the query
text
won’t
. More precisely, the query
won’t
will find documents with the text
“won’t,” but only because it is equivalent to the query
won
. The contraction
‘t
is removed from both the query and the document index, as a stop word (see
Stop Words
, on page 5-7). You cannot distinguish documents with “won’t”
from documents with “won.”
Word Characters and Separators
Word characters
include all uppercase and lowercase letters, digits, and the
following additional characters:
• _
(underscore)
• #
(number/pound/hash sign)
• &
(ampersand)
All other characters are
separators
(except, in queries, the wildcards
?
and
*
,
and the special query characters
~
,
“
,
-
, and
!
).
However, the following rule also applies:
• &&
, by itself, is not a word. It is a Boolean operator. When combined with
at least one more word character,
&&
can be part of a word. For example,
a&&b
is a word.
Query analysis and document indexing are not case-sensitive. Uppercase and
lowercase letters are treated the same.
Regular Expression Definition of English Word Characters
Note:
This section provides information intended only for users familiar
with regular-expression notation.