HP StorageWorks Reference Information Storage System V1.0 User Guide (May 2004)

ManualsBrandsHP ManualsSoftwareHP IAP Backup Option Software E-LTU

101

102

103

104

105

106

107

108

109

110

Query Expression Syntax and Matching Chapter 5:

Query Syntax and Matching

HP StorageWorks Reference Information Storage System User Guide, April 2004 5-5

The following regular expression provides, in succinct form, a complete speci-

fication of English word characters (except for the treatment of

as a non-

word):

[ A-Za-z0-9_#& ]+

See Also

•

Stop Words

, on page 5-7

•

Matching Words

, on page 5-7

•

Boolean Query Expressions

, on page 5-10

Letters and Digits in Different Character Sets

Letters and Digits Defined

All letters and digits are word characters. Just what the RISS software

considers a letter or a digit depends on the character set encoding used. For

the US ASCII encoding, the letters are uppercase and lowercase English

letters (

A–Za–Z

). For the ISO 8859–1 (Latin–1) encoding, used for Western

European languages, accented letters are included. Most ideographic

characters, such as used in Asian languages, are also considered letters.

Whatever the language and encoding used for a particular document (file or

email message), the RISS software maps encoded characters to the

Unicode 2.0 standard. The Unicode 2.0 standard is then used to determine if

a given character is a letter or a digit (or neither):

•

letter

is any Unicode character in one of these Unicode categories:

Ll (lowercase letter), Lu (uppercase letter), Lt (titlecase letter),

Lm (modifier letter), or Lo (other letter).

•

digit

is any Unicode character whose Unicode name contains the word

DIGIT

, provided it is not in the range

\u2000

(en quad = en space) through

\u2FFF

(ideographic description – future).

This includes the digits of the following character sets: ISO 8859–1

(Latin–1), Arabic-Indic, Extended Arabic-Indic, Devanagari, Bengali,

Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam, Thai,

Lao, Tibetan, and Fullwidth.