HP-UX Reference (11i v1 05/09) - 5 Miscellaneous Topics (vol 9)
r
regexp(5) regexp(5)
character, then both the expressions [[=A=]-E] and [d-a] are invalid.
An ending range point can also be the starting range point in a subsequent range
expression. Each such range expression is evaluated separately. For example, the
bracket expression [a-m-o] is treated as [a-mm-o].
The hyphen character is treated as itself if it occurs first (after an initial ˆ, if any) or
last in the list, or as the rightmost symbol in a range expression. As examples, the
expressions [-ac] and [ac-] are equivalent and match any of the characters a, c,or-;
the expressions [ˆ-ac] and [ˆac-] are equivalent and match any characters except
<newline>, a, c,or-; the expression [%--] matches any of the characters in the defined
collating sequence between % and - inclusive; the expression [--@] matches any of the
characters in the defined collating sequence between - and @ inclusive; and the expres-
sion [a--@] is invalid, assuming - precedes a in the collating sequence.
If a bracket expression must specify both - and ], the ] must be placed first (after the ˆ,
if any) and the - last within the bracket expression.
character class
A character class expression represents the set of characters belonging to a character
class, as defined via the most current setting of the locale category LC_CTYPE. It is
expressed as a character class name enclosed within bracket-colon ([: :]) delimiters.
Standard character class expressions supported in all locales are:
[:alpha:] letters
[:upper:] upper-case letters
[:lower:] lower-case letters
[:digit:] decimal digits
[:xdigit:] hexadecimal digits
[:alnum:] letters or decimal digits
[:space:] characters producing white-space in displayed text
[:print:] printing characters
[:punct:] punctuation characters
[:graph:] characters with a visible representation
[:cntrl:] control characters
[:blank:] blank characters
For example, if the locale category LC_CTYPE is set to C locale, the expression
[[:upper:]] is equivalent to [A-Z]. Similarly the expression [[:digit:]]
is same as [0-9].
REs Matching Multiple Characters
The following rules may be used to construct REs matching multiple characters from REs matching a single
character:
RE RE The concatenation of REs is an RE that matches the first encountered concatenation
of the strings matched by each component of the RE. For example, the RE bc
matches the second and third characters of the string abcdefabcdef.
RE∗∗ An RE matching a single character followed by an asterisk (∗∗) is an RE that matches
zero or more occurrences of the RE preceding the asterisk. The first encountered
string that permits a match is chosen, and the matched string will encompass the
maximum number of characters permitted by the RE. For example, in the string
abbbcdeabbbbbbcde, both the RE b∗∗c and the RE bbb∗∗c are matched by the sub-
string bbbc in the second through fifth positions. An asterisk as the first character of
an RE loses this special meaning and is treated as itself.
\(
RE\) A subexpression can be defined within an RE by enclosing it between the character
pairs \( and \). Such a subexpression matches whatever it would have matched
without the \( and \). Subexpressions can be arbitrarily nested. An asterisk immedi-
ately following the \( loses its special meaning and is treated as itself. An asterisk
HP-UX 11i Version 1: September 2005 − 3 − Hewlett-Packard Company Section 5−−301