Regular Expressions

Regular expressions (regex) provide a way to match strings of text based on patterns or patterns of characters. Regex searches can be performed using the Find and Replace feature in lieu of doing a Literal Text type of search. This can be useful in manipulating large amounts of text-based content. Regexes are also used within the site settings with the configuration of the File Naming and Binary File Naming settings in order to restrict the type of characters that can be used during page creation or an upload of a binary file. It is possible that regexes may be included in the templates on occasion.

A regex find and replace is available for end-users and administrators that have access to the Source Editor. The regex behavior as exemplified within the Source Editor and with a Find and Replace is slightly different in that the Source Editor is line-by-line and the Find and Replace searches the whole page.

Examples

Regexes can be used for a variety of reasons within OU Campus. This includes to help enforce a file naming convention with the use of a regex in the the site settings and with Find and Replace. The examples below provide a bit more information about usage.

Restricting Characters on Page Creation and Upload

Restricting the available character set that may be used when creating and naming a new page file or uploading a file, specifically binary files, can be very helpful in ensuring that URLs follow best practices. Commonly naming conventions include lowercase letters, numeric values, hyphens, underscores, and periods only. The regex for this naming convention would be: [a-z0-9\-_.]*

In OU Campus, this regex may be utilized in the restricting file naming to a certain type.

Example of Binary File Naming Settings

Site Settings File Naming

For more information about using file naming in the site settings, visit the File Naming page.

For more information about setting up binary file naming in the site settings, visit the Binary File Naming page.

Using Regex in Find and Replace

Regular expressions are useful to programmatically define find and replace matches.

For more information about using Find and Replace, visit the Find and Replace page.

Finding Editable Region Tags

<!-- com\.omniupdate\.div.*?-->.*?<!-- /com\.omniupdate\.div -->

This regex pattern combined with the "dot match new lines" feature of PowerGREP can be used to match the editable regions of any tagged page. The characters highlighted in blue instruct the regular expression engine to match any characters found in between the text that comes before and after. This enables this pattern to return every editable region that it finds.

Matching the First Line of Any File

(\A.*)

This pattern will match the first line of any file. It is useful for discovering the header information of a file to determine the type of page (ASP, JSP, PHP, HTML, etc.).

Supported Regular Expressions in Find and Replace

Characters
x The character x
\\ The backslash character
\0n n The character with octal value 0n (0 <= n <= 7)
\0nn nn The character with octal value 0nn (0 <= n <= 7)
\0mnn mnn The character with octal value 0mnn (0 <= m <= 3, 0 <= n <= 7)
\xhh hh The character with hexadecimal value 0xhh
\uhhh hhhh The character with hexadecimal value 0xhhhh
\t The tab character ('\u0009')
\n The newline (line feed) character ('\u000A')
\r The carriage-return character ('\u000D')
\f The form-feed character ('\u000C')
\a The alert (bell) character ('\u0007')
\e The escape character ('\u001B')
\cx The control character corresponding to x
Character classes
[abc] a, b, or c (simple class)
[^abc] Any character except a, b, or c (negation)
[a-zA-Z] a through z or A through Z, inclusive (range)
[a-d[m-p]] a through d, or m through p: [a-dm-p] (union)
[a-z&&[def]] d, e, or f (intersection)
[a-z&&[^bc]] a through z, except for b and c: [ad-z] (subtraction)
[a-z&&[^m-p]] a through z, and not m through p: [a-lq-z](subtraction)
Predefined character classes
. Any character (may or may not match line terminators)
\d A digit: [0-9]
\D A non-digit: [^0-9]
\s A whitespace character: [ \t\n\x0B\f\r]
\S A non-whitespace character: [^\s]
\w A word character: [a-zA-Z_0-9]
\W A non-word character: [^\w]
POSIX character classes (US-ASCII only)
\p{Lower} A lower-case alphabetic character: [a-z]
\p{Upper} An upper-case alphabetic character: [A-Z]
\p{ASCII} All ASCII: [\x00-\x7F]
\p{Alpha} An alphabetic character: [\p{Lower}\p{Upper}]
\p{Digit} A decimal digit: [0-9]
\p{Alnum} An alphanumeric character: [\p{Alpha}\p{Digit}]
\p{Punct} Punctuation: One of !"#$%&'()*+,-./:;<=>?@[\]^_'{|}~
\p{Graph} A visible character: [\p{Alnum}\p{Punct}]
\p{Print} A printable character: [\p{Graph}]
\p{Blank} A space or a tab: [ \t]
\p{Cntrl} A control character: [\x00-\x1F\x7F]
\p{XDigit} A hexadecimal digit: [0-9a-fA-F]
\p{Space} A whitespace character: [ \t\n\x0B\f\r]
Boundary matchers
^ The beginning of a line
$ The end of a line
\b A word boundary
\B A non-word boundary
\A The beginning of the input
\G The end of the previous match
\Z The end of the input but for the final terminator, if any
\z The end of the input
Greedy quantifiers
X? X, once or not at all
X* X, zero or more times
X+ X, one or more times
X{n} X, exactly n times
X{n,} X, at least n times
X{n,m} X, at least n but not more than m times
Reluctant quantifiers
X?? X, once or not at all
X*? X, zero or more times
X+? X, one or more times
X{n}? X, exactly n times
X{n,}? X, at least n times
X{n,m}? X, at least n but not more than m times
Possessive quantifiers
X?+ X, once or not at all
X*+ X, zero or more times
X++ X, one or more times
X{n}+ X, exactly n times
X{n,}+ X, at least n times
X{n,m}+ X, at least n but not more than m times
Logical operators
XY X followed by Y
X|Y Either X or Y
(X) X, as a capturing group
Back references
$n The contents of the nth capturing group will be used in the replace field

Special constructs (non-capturing)
(?:X) X, as a non-capturing group
(?=X) X, via zero-width positive lookahead
(?!X) X, via zero-width negative lookahead
(?<=X) X, via zero-width positive lookbehind
(?<!X) X, via zero-width negative lookbehind
(?>X) X, as an independent, non-capturing group

Source:http://www.oracle.com/technetwork/java/index.html#sum

Case Insensitive, Multiline, and Dotall Matches
To modify the case sensitivity, multiline, or dotall behavior, use the following preprocessing instructions as a prefix to your regular expression
Preprocessing
Instruction
Description Example
(?i) Case insensitive
matching
(?i)ANY case
(?m) Multiline matching (?m)^search from the beginning of a text line to the end$
(?s) Dotall matching
(dot matches everything)
(?s)<div>(.*?)</div>
(?sim) Use all three!