regexp.5 (2010 09)
r
regexp(5) regexp(5)
\(RE\) A subexpression can be defined within an RE by enclosing it between the character
pairs \( and \). Such a subexpression matches whatever it would have matched
without the \( and \). Subexpressions can be arbitrarily nested. An asterisk
immediately following the \( loses its special meaning and is treated as itself. An
asterisk immediately following the \) is treated as an invalid character.
\n The expression \n matches the same string of characters as was matched by a
subexpression enclosed between \( and \) preceding the
\n. The character n
must be a digit from
1 through
9, specifying the n-th subexpression (the one that
begins with the n-th
\(
and ends with the corresponding paired
\). For example,
the expression
ˆ\(.*\)\1$ matches a line consisting of two adjacent appearances
of the same string.
If the
\n is followed by an asterisk, it matches zero or more occurrences of the
subexpression referred to. For example, the expression
\(ab\(cd\)ef\)Z\2*Z\1
matches the string
abcdefZcdcdZabcdef
.
RE
\{m,n\} An RE matching a single character followed by
\{m\}, \{m,\},or\{m
,n\} is
an RE that matches repeated occurrences of the RE. The values of m and n must be
decimal integers in the range 0 through 255, with m specifying the exact or
minimum number of occurrences and n specifying the maximum number of
occurrences.
\{m\} matches exactly m occurrences of the preceding RE, \{m,\}
matches at least m occurrences, and \{m,n\} matches any number of occurrences
between m and n,inclusive.
The first encountered string that matches the expression is chosen; it will contain as
many occurrences of the RE as possible. For example, in the string
abbbbbbbc
the RE b\{3\} is matched by characters two through four, the RE b\{3,\} is
matched by characters two through eight, and the RE b\{3,5\}c is matched by
characters four through nine.
Expression Anchoring
An RE can be limited to matching strings that begin or end a line (i.e., anchored) according to the follow-
ing rules:
• A circumflex (ˆ) as the first character of an RE anchors the expression to the beginning of a line;
only strings starting at the first character of a line are matched by the RE. For example, the RE
^ab matches the string ab in the line abcdef, but not the same string in the line
cdefab.
• A dollar sign (
$) as the last character of an RE anchors the expression to the end of a line; only
strings ending at the last character of a line are matched by the RE. For example, the RE
ab$
matches the string ab in the line cdefab, but not the same string in the line
abcdef.
• An RE anchored by both
ˆ and $ matches only strings that are lines. For example, the RE
^abcdef$ matches only lines consisting of the string abcdef.
The use of duplication characters (+,*) following anchors is illegal.
EXTENDED REGULAR EXPRESSIONS
The extended regular expression (ERE) notation and construction rules apply to utilities defined as using
extended REs. Any exceptions to the following rules are noted in the descriptions of the specific utilities
using EREs.
EREs Matching a Single Character
The following EREs match a single character or a single collating element:
Ordinary Characters
An ordinary character is an ERE that matches itself. An ordinary character is any character in the sup-
ported character set except newline and the regular expression special characters listed in Special Char-
acters below. An ordinary character preceded by a backslash (
\) is treated as the ordinary character
itself. Matching is based on the bit pattern used for encoding the character, not on the graphic represen-
tation of the character.
Special Characters
A regular expression special character preceded by a backslash is a regular expression that matches the
special character itself. When not preceded by a backslash, such characters have special meaning in the
specification of EREs. The extended regular expression special characters and the contexts in which they
4 Hewlett-Packard Company − 4 − HP-UX 11i Version 3: September 2010