Specifications

Character Sets and Classes
Using character sets immediately gives regular expressions more power than exact matching
expressions. Character sets can be used to match any character of a particular typetheyre
really a kind of wildcard.
First of all, you can use the . character as a wildcard for any other single character except a
new line (\n). For example, the regular expression
.at
matches the strings “cat”, “sat”, and “mat”, among others.
This kind of wildcard matching is often used for filename matching in operating systems.
With regular expressions, however, you can be more specific about the type of character you
would like to match, and you can actually specify a set that a character must belong to. In the
previous example, the regular expression matches “cat” and “mat”, but also matches “#at”. If
you want to limit this to a character between a and z, you can specify it as follows:
[a-z]
Anything enclosed in the special square brace characters [ and ] is a character classa set of
characters to which a matched character must belong. Note that the expression in the square
brackets matches only a single character.
You can list a set, for example
[aeiou]
means any vowel.
You can also describe a range, as we just did using the special hyphen character, or a set of
ranges:
[a-zA-Z]
This set of ranges stands for any alphabetic character in upper- or lowercase.
You can also use sets to specify that a character cannot be a member of a set. For example,
[^a-z]
matches any character that is not between a and z. The caret symbol means not when it is
placed inside the square brackets. It has another meaning when used outside square brackets,
which well look at in a minute.
In addition to listing out sets and ranges, a number of predefined character classes can be used
in a regular expression. These are shown in Table 4.3.
Using PHP
P
ART I
110
06 7842 CH04 3/6/01 3:41 PM Page 110