HP-UX Reference (11i v1 05/09) - 5 Miscellaneous Topics (vol 9)

ManualsBrandsHP ManualsSoftwareHP-UX Reference Manuals

311

312

313

314

315

316

317

318

319

320

regexp(5) regexp(5)

character, then both the expressions [[=A=]-E] and [d-a] are invalid.

An ending range point can also be the starting range point in a subsequent range

expression. Each such range expression is evaluated separately. For example, the

bracket expression [a-m-o] is treated as [a-mm-o].

The hyphen character is treated as itself if it occurs ﬁrst (after an initial ˆ, if any) or

last in the list, or as the rightmost symbol in a range expression. As examples, the

expressions [-ac] and [ac-] are equivalent and match any of the characters a, c,or-;

the expressions [ˆ-ac] and [ˆac-] are equivalent and match any characters except

<newline>, a, c,or-; the expression [%--] matches any of the characters in the deﬁned

collating sequence between % and - inclusive; the expression [--@] matches any of the

characters in the deﬁned collating sequence between - and @ inclusive; and the expres-

sion [a--@] is invalid, assuming - precedes a in the collating sequence.

If a bracket expression must specify both - and ], the ] must be placed ﬁrst (after the ˆ,

if any) and the - last within the bracket expression.

character class

A character class expression represents the set of characters belonging to a character

class, as deﬁned via the most current setting of the locale category LC_CTYPE. It is

expressed as a character class name enclosed within bracket-colon ([: :]) delimiters.

Standard character class expressions supported in all locales are:

[:alpha:] letters

[:upper:] upper-case letters

[:lower:] lower-case letters

[:digit:] decimal digits

[:xdigit:] hexadecimal digits

[:alnum:] letters or decimal digits

[:space:] characters producing white-space in displayed text

[:print:] printing characters

[:punct:] punctuation characters

[:graph:] characters with a visible representation

[:cntrl:] control characters

[:blank:] blank characters

For example, if the locale category LC_CTYPE is set to C locale, the expression

[[:upper:]] is equivalent to [A-Z]. Similarly the expression [[:digit:]]

is same as [0-9].

REs Matching Multiple Characters

The following rules may be used to construct REs matching multiple characters from REs matching a single

character:

RE RE The concatenation of REs is an RE that matches the ﬁrst encountered concatenation

of the strings matched by each component of the RE. For example, the RE bc

matches the second and third characters of the string abcdefabcdef.

RE∗∗ An RE matching a single character followed by an asterisk (∗∗) is an RE that matches

zero or more occurrences of the RE preceding the asterisk. The ﬁrst encountered

string that permits a match is chosen, and the matched string will encompass the

maximum number of characters permitted by the RE. For example, in the string

abbbcdeabbbbbbcde, both the RE b∗∗c and the RE bbb∗∗c are matched by the sub-

string bbbc in the second through ﬁfth positions. An asterisk as the ﬁrst character of

an RE loses this special meaning and is treated as itself.

RE\) A subexpression can be deﬁned within an RE by enclosing it between the character

pairs \( and \). Such a subexpression matches whatever it would have matched

without the \( and \). Subexpressions can be arbitrarily nested. An asterisk immedi-

ately following the \( loses its special meaning and is treated as itself. An asterisk

HP-UX 11i Version 1: September 2005 − 3 − Hewlett-Packard Company Section 5−−301