User Guide

Multicharacter Regular Expressions 263
Multicharacter Regular Expressions
You can use the following rules to build multicharacter regular expressions:
Parentheses group parts of regular expressions together into grouped
subexpressions that you can be treat as a single unit; for example, (ha)+ matches
one or more instances of ha.
A plus sign (+) following a one-character regular expression or grouped
subexpressions matches one or more occurrences of the regular expression; for
example, [a-z]+ matches one or more lowercase characters.
An asterisk (*) following a one-character regular expression or grouped
subexpressions matches zero or more occurrences of the regular expression; for
example, [a-z]* matches zero or more lowercase characters. Since a regular
expression followed by an * can match the empty string, you can get unexpected
results when there is no actual match. For example,
<cfoutput>REReplace("Hello","[T]*","7","ALL") -
#REReplace("Hello","[T]*","7","ALL")#<BR></cfoutput>
results in the following output:
REReplace("Hello","[T]*","7","ALL") - 7H7e7l7l7o
Here the regular expression [T]* can match empty strings. It first matches the
empty string before H in Hello. Next, (note that the ALL artgument tells
REReplace to replace all instances of an expression), the empty string before e
is matched and so on until the empty string before o is matched. This result
might be unexpected. The workarounds for these types of problems are specific
to each case. In some cases you can use [T]+, which requires at least one T,
instead of [T]*. Alternatively, you might be able to specify an additional pattern
after [T]*. In the following example the regular expression has a W at the end:
<cfoutput>REReplace("Hello World","[T]*W","7","ALL")
#REReplace("Hello World","[T]*W","7","ALL")#<BR></cfoutput>
This expression results in the following more predictable output:
REReplace("Hello World","[T]*W","7","ALL") - Hello 7orld
A one-character regular expression or grouped subexpression followed by a
question mark (?) matches zero or one occurrences of the regular expression; for
example, xy?z matches either xyz or xz.
The concatenation of regular expressions creates a regular expression that
matches the corresponding concatenation of strings; for example, [A-Z][a-z]*
matches any capitalized word.
The OR character (|) allows a choice between two regular expressions; for
example, jell(y|ies) matches either jelly or jellies.
The following suffixes match repetitions of a regular expresion:
{m,n}, where m is 0 or greater and n is greater than or equal to m, forces a
match of m through n (inclusive) occurrences of the preceding regular
expression; for example, (ba){2,4} matches baba, bababa, and
babababa, but not ba or babababababa.
{m,} forces a match of at least m occurrences of the preceding regular
expression. The syntax {,n} is not allowed.