User Guide
Multicharacter Regular Expressions 263
Multicharacter Regular Expressions
You can use the following rules to build multicharacter regular expressions:
• Parentheses group parts of regular expressions together into grouped
subexpressions that you can be treat as a single unit; for example, (ha)+ matches
one or more instances of “ha”.
• A plus sign (+) following a one-character regular expression or grouped
subexpressions matches one or more occurrences of the regular expression; for
example, [a-z]+ matches one or more lowercase characters.
• An asterisk (*) following a one-character regular expression or grouped
subexpressions matches zero or more occurrences of the regular expression; for
example, [a-z]* matches zero or more lowercase characters. Since a regular
expression followed by an * can match the empty string, you can get unexpected
results when there is no actual match. For example,
<cfoutput>REReplace("Hello","[T]*","7","ALL") -
#REReplace("Hello","[T]*","7","ALL")#<BR></cfoutput>
results in the following output:
REReplace("Hello","[T]*","7","ALL") - 7H7e7l7l7o
Here the regular expression [T]* can match empty strings. It first matches the
empty string before “H” in “Hello”. Next, (note that the “ALL” artgument tells
REReplace to replace all instances of an expression), the empty string before “e”
is matched and so on until the empty string before “o” is matched. This result
might be unexpected. The workarounds for these types of problems are specific
to each case. In some cases you can use [T]+, which requires at least one “T”,
instead of [T]*. Alternatively, you might be able to specify an additional pattern
after [T]*. In the following example the regular expression has a “W” at the end:
<cfoutput>REReplace("Hello World","[T]*W","7","ALL") –
#REReplace("Hello World","[T]*W","7","ALL")#<BR></cfoutput>
This expression results in the following more predictable output:
REReplace("Hello World","[T]*W","7","ALL") - Hello 7orld
• A one-character regular expression or grouped subexpression followed by a
question mark (?) matches zero or one occurrences of the regular expression; for
example, xy?z matches either “xyz” or “xz”.
• The concatenation of regular expressions creates a regular expression that
matches the corresponding concatenation of strings; for example, [A-Z][a-z]*
matches any capitalized word.
• The OR character (|) allows a choice between two regular expressions; for
example, jell(y|ies) matches either “jelly” or “jellies”.
• The following suffixes match repetitions of a regular expresion:
− {m,n}, where m is 0 or greater and n is greater than or equal to m, forces a
match of m through n (inclusive) occurrences of the preceding regular
expression; for example, (ba){2,4} matches “baba”, “bababa”, and
“babababa”, but not “ba” or “babababababa”.
− {m,} forces a match of at least m occurrences of the preceding regular
expression. The syntax {,n} is not allowed.