Datasheet

Meta-Character Matches
\A The beginning of the input
\G The end of the previous match
\Z The end of the input before any line terminators (such as car-
riage-return or linefeed)
\z The end of the input
Regular expression languages also have character classes, which are a way of specifying a list of possible
characters that can match any single character in the string you want to match. If you want to specify
a character class explicitly, the characters go between square brackets. Therefore, the character class
[0123456789] matches any single digit. It is also possible to specify “any character except one of these”
by using the caret after the first square bracket. Using the expression
[^012], any single digit except
for 0, 1, and 2 is matched. You can specify character ranges using the dash. The character class
[a-z]
matches any single lowercase letter, and [^a-z] matches any character except a lowercase letter. Any
character range can be used, such as
[0–9] to match a single digit, or [0–3] to match a 0, 1, 2, or 3.
Multiple ranges can be specified, such as
[a-zA-Z] to match any single letter. The regular expression
package contains a set of predefined character classes, and these are listed in the following table.
Character Class Meta-Character Matches
. Any single character
\d A digit [0–9]
\D A nondigit [^0–9]
\s A whitespace character [ \t\n\x0B\f\r]
\S A nonwhitespace character [^\s]
\w A word character [a–zA–Z_0–9]
\W A nonword character [^\w]
Additionally, there are POSIX character classes and Java character classes. These are listed in the follow-
ing tables, respectively.
Character Class Meta-Character Matches
\p{Lower} Lowercase letter [a-z]
\p{Upper} Uppercase letter [A-Z]
\p{ASCII}
All ASCII [\x00-\x7F]
Table continued on following page
63
Chapter 1: Key Java Language Features and Libraries
05_777106 ch01.qxp 11/28/06 10:43 PM Page 63