User Guide

ManualsBrandsMacromedia ManualsOtherCOLDFUSION 5-DEVELOPING APPLICATIONS

281

282

283

284

285

286

287

288

289

290

Multicharacter Regular Expressions 263

Multicharacter Regular Expressions

You can use the following rules to build multicharacter regular expressions:

• Parentheses group parts of regular expressions together into grouped

subexpressions that you can be treat as a single unit; for example, (ha)+ matches

one or more instances of “ha”.

• A plus sign (+) following a one-character regular expression or grouped

subexpressions matches one or more occurrences of the regular expression; for

example, [a-z]+ matches one or more lowercase characters.

• An asterisk (*) following a one-character regular expression or grouped

subexpressions matches zero or more occurrences of the regular expression; for

example, [a-z]* matches zero or more lowercase characters. Since a regular

expression followed by an * can match the empty string, you can get unexpected

results when there is no actual match. For example,

<cfoutput>REReplace("Hello","[T]*","7","ALL") -

#REReplace("Hello","[T]*","7","ALL")#<BR></cfoutput>

results in the following output:

REReplace("Hello","[T]*","7","ALL") - 7H7e7l7l7o

Here the regular expression [T]* can match empty strings. It first matches the

empty string before “H” in “Hello”. Next, (note that the “ALL” artgument tells

REReplace to replace all instances of an expression), the empty string before “e”

is matched and so on until the empty string before “o” is matched. This result

might be unexpected. The workarounds for these types of problems are specific

to each case. In some cases you can use [T]+, which requires at least one “T”,

instead of [T]*. Alternatively, you might be able to specify an additional pattern

after [T]*. In the following example the regular expression has a “W” at the end:

<cfoutput>REReplace("Hello World","[T]*W","7","ALL") –

#REReplace("Hello World","[T]*W","7","ALL")#<BR></cfoutput>

This expression results in the following more predictable output:

REReplace("Hello World","[T]*W","7","ALL") - Hello 7orld

• A one-character regular expression or grouped subexpression followed by a

question mark (?) matches zero or one occurrences of the regular expression; for

example, xy?z matches either “xyz” or “xz”.

• The concatenation of regular expressions creates a regular expression that

matches the corresponding concatenation of strings; for example, [A-Z][a-z]*

matches any capitalized word.

• The OR character (|) allows a choice between two regular expressions; for

example, jell(y|ies) matches either “jelly” or “jellies”.

• The following suffixes match repetitions of a regular expresion:

− {m,n}, where m is 0 or greater and n is greater than or equal to m, forces a

match of m through n (inclusive) occurrences of the preceding regular

expression; for example, (ba){2,4} matches “baba”, “bababa”, and

“babababa”, but not “ba” or “babababababa”.

− {m,} forces a match of at least m occurrences of the preceding regular

expression. The syntax {,n} is not allowed.