Datasheet
The designers of the regular expression library decided to use a Pattern-Matcher model, which separates
the regular expression from the matcher itself. The regular expression is compiled into a more optimized
form by the
Pattern class. This compiled pattern can then be used with multiple matchers, or reused by
the same matcher matching on different strings.
In a regular expression, any single character matches literally, except for just a few exceptions. One such
exception is the period (.), which matches any single character in the string that is being analyzed. There
are sets of meta-characters predefined to match specific characters. These are listed in the following table.
Meta-Character Matches
\\ A single backslash
\0n An octal value describing a character, where n is a number such that 0 <= n <= 7
\0nn
\0mnn An octal value describing a character, where m is 0 <= m <= 3 and n is 0 <= n <= 7
\0xhh The character with hexadecimal value hh (where 0 <= h <= F)
\uhhhh The character with hexadecimal value hhhh (where 0 <= h <= F)
\t A tab (character ‘\u0009’)
\n A newline (linefeed) (‘\u000A’)
\r A carriage-return (‘\u000D’)
\f A form-feed (‘\u000C’)
\a A bell/beep character (‘\u0007’)
\e An escape character (‘\u001B’)
\cx The control character corresponding to x, such as \cc is control-c
. Any single character
The regular expression language also has meta-characters to match against certain string boundaries.
Some of these boundaries are the beginning and end of a line, and the beginning and end of words. The
full list of boundary meta-characters can be seen in the following table.
Meta-Character Matches
^ Beginning of the line
$ End of the line
\b A word boundary
\B A non-word boundary
62
Part I: Thinking Like a Java Developer
05_777106 ch01.qxp 11/28/06 10:43 PM Page 62