RegEx functions
Functions in the regex module
All regular expression functions below currently use RE2 (Golang flavor of RE2) for the regular expression syntax. RE2 shares many features with PCRE, but is typically more efficient and does not include computationally expensive features of PCRE. For more details, see the official documentation for RE2.
Single quote
'vs double quote"Use single quotes for regular expressions so that escape sequences such as
\ddon't have to be escaped twice.For instance, to find three consecutive digits anywhere in a string, use
regex_search(field, '\d\d\d')to avoid double escapes of the\\. This is equivalent toregex_search(field, "\\d\\d\\d")
match / imatch
match / imatchregex.match(input: string, pattern: regexp, ...) -> bool
regex.imatch(input: string, pattern: regexp, ...) -> bool
match is used to match an _entire string _against a list of regular expressions. If at least one of the regular expressions matches the entire string, then match will return true. match performs a case-sensitive regular expression match, but imatch is case-insensitive. Both functions match against the entire string, so add leading and trailing wildcards (.* or .*?) to search for a substring within the entire input string.
If input is null, then match and imatch will return null.
Examples
regex.match("[email protected]", ".*[email protected]") -> true
# use imatch for case-insensitive matches
regex.match("[email protected]", "@sublimesecurity.com") -> false
regex.imatch("[email protected]", "@sublimesecurity.com") -> true
# if multiple regular expressions are provided, only one needs to match
regex.match("[email protected]", "@.*.org$", "@.*.com$", "@.*.gov$") -> true
contains / icontains
contains / icontainsregex.contains(input: string, pattern: regexp, ...) -> bool
regex.icontains(input: string, pattern: regexp, ...) -> bool
regex.contains is used to check if a string contains a substring that has matches at least one of a list of regular expressions. Unlike regex.match, the full string does not need to match. regex.contains(field, '\bfoo\b') has the same behavior as regex.match(field, '.*\bfoo\b.*').
For case-insensitive regular expression matching, use regex.icontains.
Examples
regex.contains("[email protected]", "@(google|sublimesecurity)") -> true
# use icontains for case-insensitive substring
regex.contains("[email protected]", "@(google|sublimesecurity)") -> true
# if multiple regular expressions are provided, only one needs to match
regex.contains("[email protected]", "@google", "@sublimesecurity") -> true
count / icount
count / icountregex.count(input: string, pattern: regexp) -> bool
regex.icount(input: string, pattern: regexp) -> bool
regex.count is used to count the number of times pattern matches input. Matches are greedy. For example, regex.count('hello', '.*') will be 1.
For case-insensitive regular expression matching, use regex.icount.
Examples
# Find large numbers of unusual characters
regex.count(body.current_thread.text, '[^\x00-\x7F]') > 20
# Find multiple uses of excessive punctuation
regex.count(body.current_thread.text, '[!?.]{2,}') > 3
extract / iextract
extract / iextractregex.extract(input: string, pattern: regexp) -> [RegexMatch]
regex.iextract(input: string, pattern: regexp) -> [Regexmatch]
regex.extract is used to return all regular expression matches within a string, including submatches for capture groups. This is similar to regex.contains, but instead of returning a boolean true/false for whether a match exists, it returns the complete match and the individual submatches.
The returned fields for one of the matches:
full_match: Matches the complete regular expression, including all capture gorupsgroups: A list of all the strings matched by capture groups. This will always be the same length as the number of capture groups, and individual captures are nevernullbut"".named_groups: A mapping ofstring->stringfor capture groups with names. In RE2 syntax, this is done via(?P<my_capture>.*)like syntax, where"my_capture"will be one of the keys to the group. The resulting string,.named_groups["my_capture"]is the value matching that group.
For case-insensitive regular expression extraction, use regex.iextract.
Examples
// With positional capture groups
any(regex.iextract(sender.display_name, '\A(.*)\((?:via )?Google'),
any(.groups, . in~ $org_display_names)
)
// With named capture groups
any(regex.iextract(sender.display_name,
'\A(?P<sender_display_name>.*)\((?:via )?Google'
),
.named_groups["sender_display_name"] in~ $org_display_names
)
Updated 7 months ago
