BASICS

ADMINISTRATION

DEVELOPMENT

SUPER ADMIN

  LAST UPDATED
4/8/2014

Regular Expressions

Regular expressions (regex) provide a way to match strings of text based on patterns or patterns of characters. Regex searches can be performed using the Global Find and Replace feature in lieu of doing a Literal Text type of search. This can be useful in manipulating large amounts of text-based content. Regexes are also used within the Site Creation in order to restrict the type of characters that can be used during page creation or an upload of a binary file. It is possible that regexes may be included in the templates on occasion.


 

Next Page

 

Examples

Regexes can be used for a variety of reasons. Below are just a couple of examples of how a regular expression may be used within OU Campus.

Restricting Characters on Page Creation and Upload

Restricting the available character set that may be used when creating and naming a new page file or uploading a file, specifically binary files, can be very helpful in ensuring that URLs follow best practices. Commonly naming conventions include lowercase letters, numeric values, hyphen, underscore, and period only. The regex for this naming convention would be: [a-z0-9\-_.]*

In OU Campus, this regex may be placed in the Sites Settings screen.
Site Settings Regex 

 

Using Regex in Global Find and Replace

Regular expressions are useful to programmatically define find and replace matches when using Global Find and Replace. The example below illustrates how to use back references ($1) to replace a specific <span> tag with an <h1> tag while preserving the content in between the tags. A back reference is created by the capturing group, shown by the parentheses, and is designated in the Replace field by the dollar sign and the capturing group number.

Example of using a back reference with Global Find and Replace

Finding Editable Region Tags

<!-- com\.omniupdate\.div.*?-->.*?<!-- /com\.omniupdate\.div -->

This regex pattern combined with the "dot match new lines" feature of PowerGREP can be used to match the editable regions of any tagged page. The characters highlighted in blue instruct the regular expression engine to match any characters found in between the text that comes before and after. This enables this pattern to return every editable region that it finds.

Matching the First Line of Any File

(\A.*)

This pattern will match the first line of any file. It is useful for discovering the header information of a file to determine the type of page (ASP, JSP, PHP, HTML, etc.).

Previous PageNext Page

 

Supported Regular Expressions in Global Find and Replace

The following list outlines the regular expressions syntax supported by Global Find and Replace:

Characters
x The character x
\\ The backslash character
\0n n The character with octal value 0n (0 <= n <= 7)
\0nn nn The character with octal value 0nn (0 <= n <= 7)
\0mnn mnn The character with octal value 0mnn (0 <= m <= 3, 0 <= n <= 7)
\xhh hh The character with hexadecimal value 0xhh
\uhhh hhhh The character with hexadecimal value 0xhhhh
\t The tab character ('\u0009')
\n The newline (line feed) character ('\u000A')
\r The carriage-return character ('\u000D')
\f The form-feed character ('\u000C')
\a The alert (bell) character ('\u0007')
\e The escape character ('\u001B')
\cx The control character corresponding to x

Character classes
[abc] a, b, or c (simple class)
[^abc] Any character except a, b, or c (negation)
[a-zA-Z] a through z or A through Z, inclusive (range)
[a-d[m-p]] a through d, or m through p: [a-dm-p] (union)
[a-z&&[def]] d, e, or f (intersection)
[a-z&&[^bc]] a through z, except for b and c: [ad-z] (subtraction)
[a-z&&[^m-p]] a through z, and not m through p: [a-lq-z](subtraction)

Predefined character classes
. Any character (may or may not match line terminators)
\d A digit: [0-9]
\D A non-digit: [^0-9]
\s A whitespace character: [ \t\n\x0B\f\r]
\S A non-whitespace character: [^\s]
\w A word character: [a-zA-Z_0-9]
\W A non-word character: [^\w]

POSIX character classes (US-ASCII only)
\p{Lower} A lower-case alphabetic character: [a-z]
\p{Upper} An upper-case alphabetic character: [A-Z]
\p{ASCII} All ASCII: [\x00-\x7F]
\p{Alpha} An alphabetic character: [\p{Lower}\p{Upper}]
\p{Digit} A decimal digit: [0-9]
\p{Alnum} An alphanumeric character: [\p{Alpha}\p{Digit}]
\p{Punct} Punctuation: One of !"#$%&'()*+,-./:;<=>?@[\]^_'{|}~
\p{Graph} A visible character: [\p{Alnum}\p{Punct}]
\p{Print} A printable character: [\p{Graph}]
\p{Blank} A space or a tab: [ \t]
\p{Cntrl} A control character: [\x00-\x1F\x7F]
\p{XDigit} A hexadecimal digit: [0-9a-fA-F]
\p{Space} A whitespace character: [ \t\n\x0B\f\r]

Boundary matchers
^ The beginning of a line
$ The end of a line
\b A word boundary
\B A non-word boundary
\A The beginning of the input
\G The end of the previous match
\Z The end of the input but for the final terminator, if any
\z The end of the input

Greedy quantifiers
X? X, once or not at all
X* X, zero or more times
X+ X, one or more times
X{n} X, exactly n times
X{n,} X, at least n times
X{n,m} X, at least n but not more than m times

Reluctant quantifiers
X?? X, once or not at all
X*? X, zero or more times
X+? X, one or more times
X{n}? X, exactly n times
X{n,}? X, at least n times
X{n,m}? X, at least n but not more than m times

Possessive quantifiers
X?+ X, once or not at all
X*+ X, zero or more times
X++ X, one or more times
X{n}+ X, exactly n times
X{n,}+ X, at least n times
X{n,m}+ X, at least n but not more than m times

Logical operators
XY X followed by Y
X|Y Either X or Y
(X) X, as a capturing group

Back references
$n The contents of the nth capturing group will be used in the replace field

Special constructs (non-capturing)
(?:X) X, as a non-capturing group
(?=X) X, via zero-width positive lookahead
(?!X) X, via zero-width negative lookahead
(?<=X) X, via zero-width positive lookbehind
(?<!X) X, via zero-width negative lookbehind
(?>X) X, as an independent, non-capturing group

Source:http://www.oracle.com/technetwork/java/index.html#sum

Case Insensitive, Multiline, and Dotall Matches

To modify the case sensitivity, multiline, or dotall behavior, use the following preprocessing instructions as a prefix to your regular expression:

Preprocessing
Instruction

Description

Example

(?i)

Case insensitive
matching

(?i)ANY case

(?m)

Multiline matching

(?m)^search from the beginning of a text line to the end$

(?s)

Dotall matching
(dot matches everything)

(?s)<div>(.*?)</div>

(?sim)

Use all three!

 

Previous Page

 


Was this page helpful?

Additional Comments:


(Include your email address if you would like a reply)

Procede to http://support.omniupdate.com/