Using Regular Expressions
Regular expressions are a way for matching strings of text, e.g., particular characters, words, or phrases. They can be used in searching, editing and manipulating your data. You can use Java-style regular expressions in formulas by typing them into the formula or using them in a filter. Use them to process text for items such as email addresses, file names and extensions, or company names.
Using Regular Expressions in Formulas#
- Click a column in a Workbook. The the current formula for that column is displayed in the Fx bar.
- Create or edit the formula using regular expressions. Then press Enter.
Info
The following functions allow regular expressions to be used as arguments: INDEX(), MATCHES(), NGRAM(), REGEX(), REGEXTRACT(), REPLACEALL(), and TOKENIZE()
Example
If your Workbook had a column listing all the users of your website, you could use the MATCHES() function and .*\d+.*
to find all user names containing numbers. Or you could use the REGEXTRACT() function and [JMS].*
to tokenize all user names that start with "J", "M" or "S".
Using Regular Expressions in Filters#
Here is a quick overview of how filters work with regular expressions. Refer to Filtering Data to learn more about filters.
Using regular expressions in simple filters:
- Click the Filter icon.
- Select the data to be filtered. Only string data support regular expressions. Select matches regexp from the drop-down list.
- Create the regular expression.
- Click the + (plus) button to add more columns to the filter if desired.
- Click Filter Sheet.
Using regular expressions in advanced filters:
- Click the Filter icon.
- Click the Advanced tab.
- Enter a function which uses regular expressions as an argument.
- Add more functions using the logical AND or OR operators (
&&
or||
respectively) - Click Filter Sheet.
Escape Character
As with other types of regular expressions, Java regular expressions include pre-defined character classes, e.g., \d
(any digit from 0 to 9) or \s
(a whitespace character, like tab or return). As you can see these constructs include a \ (backslash). Normally when using Java regular expressions the defined character classes must also be preceded by a backslash, e.g., \\d
. When using the Formula Builder in Spectrum this convention isn't followed, the backslash is inserted automatically. However, when inserting a regular expression directly into the formula field, this step isn't done.
Construct | Meaning | Used in the Formula Builder | Used in the Formula Field |
---|---|---|---|
.*\d | any character(s) followed by a number | .*\d | .*\\d |
\s | a white space character | \s | \\s |
Useful Links#
More information about regular expressions can be found on the web. Two sites we recommend are the Java Tutorials and 10 Regular Expressions You Should Know. If you have created a regular expression and want to make sure that it is correct or if you need to change it into a java string then RegexPlanet's Regular Expression Test Page is just what you need.
See Importing with Regular Expressions to learn more.