regex
Matching patterns and extracting information in stringsBasic Matching
Any sequence of letters or digits will match exactly that sequence. For example, the regex “the” will match precisely those places where there is a “t” followed by an “h” followed by an “e”.
Meta Characters
.
: matches any single character except line break[]
: character class, matches any character within the brackets[^]
: negated character class, matches any character not in the square brackets*
: matches 0 or more repetitions of the preceding symbol+
: matches 1 or more repetitions of the preceding symbol?
: treats preceding symbol as optional match{n,m}
: matches at least n but no more than m repetitions of preceding symbol()
: capturing group, group of sub-patterns in parentheses. Makes it possible to extract pieces from the matching|
: alternation, matches either characters before or characters after the “|”\
: escapes following character^
: match at beginning of the input$
: match at the end of the input
Character Sets
\w
: matches alphanumeric characters[a-zA-z0-9_]
\W
: matches non-alphanumeric characters[^a-zA-z0-9_]
\d
: matches any digit[0-9]
\D
: matches any non-digit character[^\d]
\s
: matches whitespace characters\S
: matches non-whitespace characters
Lookaround
A(?=B)
: positive look ahead, matchA
whenB
is ahead/afterA(?!B)
: negative look ahead, matchA
whenB
is not ahead/after(?<=B)A
: positive look behind, matchA
whenB
is behind/before(?<!B)A
: negative look ahead, matchA
whenB
is not behind/before
Multiple lookaround
When multiple lookaround clauses are used prior to some target match, most regex systems seem to match only when all conditions hold. For example, with something like (?<!B)(?<!C)A
, it will only match A
when not preceded by B
or C
. As far as I can tell, this is simply equivalent to (?<!B|C)A
.