HP-UX Reference (11i v3 07/02) - 5 Miscellaneous Topics (vol 9)
r
regexp(5) regexp(5)
\(RE\) A subexpression can be defined within an RE by enclosing it between the character
pairs \( and \). Such a subexpression matches whatever it would have matched
without the \( and \). Subexpressions can be arbitrarily nested. An asterisk
immediately following the \( loses its special meaning and is treated as itself. An
asterisk immediately following the \) is treated as an invalid character.
\n The expression \n matches the same string of characters as was matched by a subex-
pression enclosed between \( and \) preceding the
\n. The character n must be a
digit from
1 through 9, specifying the n-th subexpression (the one that begins with
the n-th
\( and ends with the corresponding paired
\). For example, the expression
^\(.*\)\1$ matches a line consisting of two adjacent appearances of the same
string.
If the \n is followed by an asterisk, it matches zero or more occurrences of the subex-
pression referred to. For example, the expression
\(ab\(cd\)ef\)Z\2*Z\1
matches the string abcdefZcdcdZabcdef
.
RE
\{m,n\} An RE matching a single character followed by
\{m\}, \{m,\},or\{m
,n\} is an
RE that matches repeated occurrences of the RE. The values of m and n must be
decimal integers in the range 0 through 255, with m specifying the exact or minimum
number of occurrences and n specifying the maximum number of occurrences.
\{m\} matches exactly m occurrences of the preceding RE, \{m,\} matches at
least m occurrences, and
\{m,n\} matches any number of occurrences between m
and n,inclusive.
The first encountered string that matches the expression is chosen; it will contain as
many occurrences of the RE as possible. For example, in the string
abbbbbbbc the
RE
b\{3\} is matched by characters two through four, the RE b\{3,\} is matched
by characters two through eight, and the RE b\{3,5\}c is matched by characters
four through nine.
Expression Anchoring
An RE can be limited to matching strings that begin or end a line (i.e., anchored) according to the following
rules:
• A circumflex (ˆ) as the first character of an RE anchors the expression to the beginning of a line;
only strings starting at the first character of a line are matched by the RE. For example, the RE
^ab matches the string ab in the line abcdef, but not the same string in the line
cdefab.
• A dollar sign (
$) as the last character of an RE anchors the expression to the end of a line; only
strings ending at the last character of a line are matched by the RE. For example, the RE
ab$
matches the string ab in the line cdefab, but not the same string in the line
abcdef.
• An RE anchored by both
ˆ and $ matches only strings that are lines. For example, the RE
^abcdef$ matches only lines consisting of the string abcdef.
The use of duplication characters (+,*) following anchors is illegal.
EXTENDED REGULAR EXPRESSIONS
The extended regular expression (ERE) notation and construction rules apply to utilities defined as using
extended REs. Any exceptions to the following rules are noted in the descriptions of the specific utilities
using EREs.
EREs Matching a Single Character
The following EREs match a single character or a single collating element:
Ordinary Characters
An ordinary character is an ERE that matches itself. An ordinary character is any character in the sup-
ported character set except newline and the regular expression special characters listed in Special Charac-
ters below. An ordinary character preceded by a backslash (\) is treated as the ordinary character itself.
Matching is based on the bit pattern used for encoding the character, not on the graphic representation of
the character.
Special Characters
A regular expression special character preceded by a backslash is a regular expression that matches the
special character itself. When not preceded by a backslash, such characters have special meaning in the
406 Hewlett-Packard Company − 4 − HP-UX 11i Version 3: February 2007