Specifications
TABLE 4.3 Character Classes for Use in POSIX-Style Regular Expressions
Class Matches
[[:alnum:]] Alphanumeric characters
[[:alpha:]] Alphabetic characters
[[:lower:]] Lowercase letters
[[:upper:]] Uppercase letters
[[:digit:]] Decimal digits
[[:xdigit:]] Hexadecimal digits
[[:punct:]] Punctuation
[[:blank:]] Tabs and spaces
[[:space:]] Whitespace characters
[[:cntrl:]] Control characters
[[:print:]] All printable characters
[[:graph:]] All printable characters except for space
Repetition
Often you want to specify that there might be multiple occurrences of a particular string or
class of character. You can represent this using two special characters in your regular expres-
sion. The * symbol means that the pattern can be repeated zero or more times, and the + sym-
bol means that the pattern can be repeated one or more times. The symbol should appear
directly after the part of the expression that it applied to. For example
[[:alnum:]]+
means “at least one alphanumeric character.”
Subexpressions
It’s often useful to be able to split an expression into subexpressions so you can, for example,
represent “at least one of these strings followed by exactly one of those.” You can do this using
parentheses, exactly the same way as you would in an arithmetic expression. For example,
(very )*large
matches “large”, “very large”, “very very large”, and so on.
String Manipulation and Regular Expressions
C
HAPTER 4
4
S
TRING
M
ANIPULATION
111
06 7842 CH04 3/6/01 3:41 PM Page 111