Datasheet

Character Class Meta-Character Matches
\p{Alpha} Any lowercase or uppercase letter
\p{Digit} A digit [0–9]
\p{Alnum} Any letter or digit
\p{Punct} Punctuation [!”#$%&’()*+,-./:;<=>?@[\]^_`{|}~]
\p{Graph} A visible character: any letter, digit, or punctuation
\p{Print} A printable character; same as \p{Graph}
\p{Blank} A space or tab [ \t]
\p{Cntrl} A control character [\x00-\x1F\x7F]
\p{XDigit} Hexadecimal digit [0–9a–fA–F]
\p{Space} A whitespace character [ \t\n\x0B\f\r]
Character Class Matches
\p{javaLowerCase} Everything that Character.isLowerCase() matches
\p{javaUpperCase} Everything that Character.isUpperCase() matches
\p{javaWhitespace} Everything that Character.isWhitespace() matches
\p{javaMirrored} Everything that Character.isMirrored() matches
Another feature of the regular expression language is the ability to match a particular character a speci-
fied number of times. In the previous example, the asterisk was used to match zero or more characters of
white space. There are two general ways the repetition operators work. One class of operators is greedy,
that is, they match as much as they can, until the end. The other class is reluctant (or lazy), and matches
only to the first chance they can terminate. For example, the regular expression
.*; matches any number
of characters up to the last semicolon it finds. To only match up to the first semicolon, the reluctant ver-
sion
.*?; must be used. All greedy operators and the reluctant versions are listed in the following two
tables, respectively.
Greedy Operator Description
X? Matches X zero or one time
X* Matches X zero or more times
X+ Matches X one or more times
X{n} Matches X exactly n times, where n is any number
X{n,} Matches X at least n times
X{n,m} Matches X at least n, but no more than m times
64
Part I: Thinking Like a Java Developer
05_777106 ch01.qxp 11/28/06 10:43 PM Page 64