A regular expression (regexp) is a specific sequence of characters that broadly or narrowly matches patterns in your data. You can use regular expressions to create more flexible filters in charts and controls. You can also use the following regular expression functions in calculated field formulas:
REGEXP_CONTAINS | Returns true if the input value contains the regular expression pattern, otherwise returns false. Learn more. |
REGEXP_EXTRACT | Returns the first matching substring in the input value that matches the regular expression pattern. Learn more. |
REGEXP_MATCH | Returns true if the input value matches the regular expression pattern, otherwise returns false. Learn more. |
REGEXP_REPLACE | Replaces all occurrences of text that match the regular expression pattern in the input value with the replacement string. Learn more. |
Alternatives to using regular expressions
Constructing regular expressions can be complex. Before using a regexp function, consider whether using a simpler text function will achieve your goal. The following functions provide regular expression-like functionality without requiring you to know regexp syntax.
Function | Description |
---|---|
CONTAINS_TEXT |
Returns true if the specified text is found in the field or expression, otherwise returns false. |
ENDS_WITH |
Returns true if the field or expression ends with the specified text, otherwise returns false. |
LEFT_TEXT |
Returns a number of characters from the beginning of a specified string. |
REPLACE | Returns a copy of the original text with all occurrences of the search text substituted with the replacement text. |
RIGHT_TEXT |
Returns a number of characters from the end of a specified string. |
STARTS_WITH |
Returns true if the field or expression begins with the specified text, otherwise returns false. |
TRIM |
Returns text with leading and trailing spaces removed. |
Regular expression examples
Extracts the top-level directory in a URL:
REGEXP_EXTRACT(URL, '^https://[^/]+/([^/]+)/')
For example, if the URL
field contained this page's address, the previous function would return looker-studio
.
Categorize ad campaigns by language:
For example, applying this regular expression to the Campaign dimension in the Google Analytics Demo account gives these results:
Campaign | Language |
---|---|
Campaign #1 | Other language |
1000549 | Google Analytics Demo | DR | apontes | NA | US | en | Hybrid | AW SEM | BKWS | ~ AW - Google Brand (US) | English |
1000549 | Google Analytics Demo | DR | apontes | NA | CA | es | Hybrid | AW SEM | BKWS | ~ AW - YouTube (CA) | Spanish |
Swap the order of sections in a string:
In the previous example, the sections are separated by a colon (:).
Metacharacters
Metacharacters are characters that have special meaning in a regular expression. Following are some of the more common metacharacters you can use. Note that these examples will open in the Google Analytics Help Center, but the information presented there applies equally to Looker Studio.
See google/RE2 Github documentation for complete regular expression syntax.
Wildcards
. | Matches any single character (letter, number, or symbol). | 1. matches 10, 1A 1.1 matches 111, 1A1 Examples |
? | Matches the preceding character 0 or 1 times. | 10? matches 1, 10 Examples |
+ | Matches the preceding character 1 or more times. | 10+ matches 10, 100 Examples |
* | Matches the preceding character 0 or more times. | 1* matches 1, 10 Examples |
| | Creates an OR match. Do not use at the end of an expression. |
1|10 matches 1, 10 Examples |
Anchors
^ | Matches the adjacent characters at the beginning of a string. | ^10 matches 10, 100, 10x ^10 does not match 110, 110x Examples |
$ | Matches the adjacent characters at the end of a string. | 10$ matches 110, 1010 10$ does not match 100, 10x Examples |
Groups
( ) | Matches the enclosed characters in exact order anywhere in a string. Also used to group other expressions. |
(10) matches 10, 101, 1011 ([0-9]|[a-z]) matches any number or lowercase letter Examples |
[ ] | Matches the enclosed characters in any order anywhere in a string. | [10] matches 012, 120, 210 Examples |
- | Creates a range of characters within brackets to match anywhere in a string. | [0-9] matches any number 0 through 9 Examples |
Escape
\ |
Indicates that the adjacent character should be interpreted literally rather than as a regex metacharacter. |
\. indicates that the adjacent dot should be interpreted as a period or decimal rather than as a wildcard 216\.239\.32\.34 matches 216.239.32.34 Examples |
Character classes
\d | digits (≡ [0-9]) |
\D | not digits (≡ [^0-9]) |
\s | whitespace (≡ [\t\n\f\r ]) |
\S | not whitespace (≡ [^\t\n\f\r ]) |
\w | word characters (≡ [0-9A-Za-z_]) |
\W | not word characters (≡ [^0-9A-Za-z_]) |
Tips
Use simple expressions
Keep your regular expressions simple. Simple expressions are easier for another user to interpret and modify.
Case-sensitivity
(?i)
flag. For example, this expression extracts both "abc123" and "ABC123":REGEXP_EXTRACT(MyField, '(?i)(a.*)')
Escape the backslash
As previously noted, you use the backslash (\) to escape regular expression metacharacters when you need those characters to be interpreted literally. To match a backslash in a quoted string, you'll need to escape that as well, resulting in 2 backslashes in your expression. For example:
REGEXP_REPLACE(String, "(\\[.*\\])\\+(\\[.*\\])","\\2 \\1")
As an alternative, consider using the Raw string literal prefix, R:
REGEXP_REPLACE(String, R"(\[.*\])\+(\[.*\])",R"\2 \1")