regexp.3x (2010 09)
r
regexp(3X) regexp(3X)
42 \( \) imbalance.
43 Too many \(.
44 More than 2 numbers given in \{ \}
.
45
} expected after \.
46 First number exceeds second in \{ \}
.
49
[]imbalance.
50 Regular expression overflow.
The syntax of the
compile() routine is as follows:
compile(instring, expbuf, endbuf, eof
)
The first parameter instring is never used explicitly by the
compile() routine, but is useful for pro-
grams that pass down different pointers to input characters. It is sometimes used in the
INIT declara-
tion (see below). Programs that call functions to input characters or have characters in an external array
can pass down a value of
((char *) 0)
for this parameter.
The next parameter expbuf is a character pointer. It points to the location where the compiled regular
expression will be placed.
The parameter endbuf is one more than the highest address where the compiled regular expression can
be placed. If the compiled expression cannot fit in (endbuf − expbuf) bytes, a call to
ERROR(50) is made.
The parameter eof is the character which marks the end of the regular expression. For example, in ed(1),
this character is usually a
/.
Each program that includes this file must have a
#define statement for INIT. This definition is
placed right after the declaration for the function compile() and the opening curly brace {. It is used
for dependent declarations and initializations. Most often it is used to set a register variable to point to
the beginning of the regular expression so that this register variable can be used in the declarations for
GETC(), PEEKC(), and UNGETC(). Otherwise it can be used to declare external variables that might
be used by GETC(), PEEKC(), and UNGETC(). See the example below of the declarations taken from
grep(1).
step() also performs actual regular expression matching in this file. The call to step is as follows:
step(string , expbuf )
The first parameter to step() is a pointer to a string of characters to be checked for a match. This
string should be null-terminated.
The second parameter expbuf is the compiled regular expression that was obtained by a call to
com-
pile().
step() returns non-zero if the given string matches the regular expression, and zero if the expressions
do not match. If there is a match, two external character pointers are set as a side effect to the call to
step(). The variable set in step() is loc1. This is a pointer to the first character that matched the
regular expression. The variable loc2, which is set by the function advance(), points to the character
after the last character that matches the regular expression. Thus, if the regular expression matches the
entire line, loc1 points to the first character of string and loc2 points to the null at the end of string .
step() uses the external variable circf, which is set by compile() if the regular expression begins
with ˆ. If this is set, step() tries to match the regular expression to the beginning of the string only.
If more than one regular expression is to be compiled before the first is executed, the value of circf should
be saved for each compiled expression and circf should be set to that saved value before each call to
step().
advance() is called from step() with the same arguments as step(). The purpose of step() is
to step through the string argument and call advance() until advance() returns non-zero, which
indicates a match, or until the end of string is reached. To constrain string to beginning-of-line in all
cases, step() need not be called; simply call advance().
When
advance() encounters a * or \{ \} sequence in the regular expression, it advances its pointer
to the string to be matched as far as possible and recursively calls itself, trying to match the rest of the
string to the rest of the regular expression. As long as there is no match, advance backs up along the
string until it finds a match or reaches the point in the string that initially matched the * or \{ \}.Itis
sometimes desirable to stop this backing up before the initial point in the string is reached. If the exter-
nal character pointer locs is equal to the point in the string at sometime during the backing up process,
advance() breaks out of the loop that backs up and returns zero. This is used by ed(1) and sed(1) for
substitutions done globally (not just the first occurrence, but the whole line) so, for example, expressions
2 Hewlett-Packard Company − 2 − HP-UX 11i Version 3: September 2010