Powerful Regular Expression

Regular expressions are extremely useful and work with many programming languages.

Esther Kim
5 min readMay 31, 2020

What is Regular Expression?

Regular Expression (regex or regexp for short) is one of the methods for processing strings and makes it very easy to process the special text string for describing a search pattern or replacing characters in specific conditions. Let’s dive in a little deeper.

1. Basic Rule of Patterns

Regular expressions are case sensitive. Each character inside the search pattern is significant including whitespace characters (space, tab, newline).
If the literal value of a special character is required, it must be escaped with a /(backslash) and a regular expression . (dot) which is any character (wild card). If the dot is more than one, then it represents a specific number of characters.

.....      // => Hello Regular expression

2. Special Characters ^ $ [ ] and Quantifier Pattern { }

Some characters have special meanings. ^ (caret) matches the beginning of the line, and the $ (dollar sign) matches the end of the line.

^HELLO     // => HELLO Regular expression HELLOHELLO$     // => HELLO Regular expression HELLO

Inside the[] (square brackets), a list of characters can be provided. The expression matches if any of these characters is found. The order of characters is insignificant. A range of characters can be specified with [-] (hyphen/dash) and several ranges can be given in one expression. If a character class starts with ^ , then specified characters will not be selected.

[EsR]       // => HELLO Regular expression HELLO
[eE]. // => HELLO Regular expression HELLO
[e-s] // => HELLO Regular expression HELLO
[a-eA-E4-6] // => HELLO Regular expression HELLO 123456
[^ELest4-6] // => HELLO Regular expression HELLO 123456
[^1-3]. // => HELLO Regular expression HELLO 123456

{} (curly braces) can do the same expressions as a number of dots. . This enables precise specification of character repetitions. {m} matches precisely m times. {m,n} matches minimal m times and maximal n times and {m,} matches minimal m times.

.{5}        // => Hello Regular expression
[ELs]{1,3} // => HELLO Regular expressssion HELLO
[Ll}{2,} // => HELLO Regular expression HELLO

3. Quantifier Patterns * + ?

Quantifiers specify how many times a character can occur. * (star) matches zero or more times, + (plus) once or more times, and ? (question mark) zero or once. Quantifiers *, + and ? are special cases that can be used with the bracket notation. * is equivalent to {0,}, + to {1,} and ? to {0,1} .

A*B        // => BC ABC AABC
A+B // => BC ABC AABC
A?B // => BC ABC AABC
.\* // => -@- *** @@ *** -@-
@+. // => -@- *** @@ *** -@-
-?@@?-
// => -@- *** -@@- *-@@@-
@A*@ // => -@- *** @@ *** -@-
-@+- // => -@- *** -- *** -@-
[-@]* // => -@- *** @@ *** -@-
[^ ]+ // => -@- *** @@ *** -@-

Depending on the combination of quantifier patterns, it can create a subpattern. This behavior is changed to matching the minimum number if a quantifier is followed with the question mark. Compare * with *?, + with +? , ? with ??. This may be a little confusing but try to remember that the second question mark’s minimum number will be dependent on what is just before the quantifier. We can use tools (attached links of tools below) to confirm when we are not sure.

L.*        // => HELLO Regular expressssion HELLO
L.*? // => HELLO Regular expressssion HELLO
L.+ // => HELLO Regular expressssion HELLO
L.+? // => HELLO Regular expressssion HELLO
L.? // => HELLO Regular expressssion HELLO
[Ll].?? // => HELLO Regular expressssion HELLO

4. Sub Patterns ( | ?= ?! )

Alternating text can be enclosed in () (parentheses) and alternatives separated with | (pipeline)

(on|ues|ednes)   // => Monday Tuesday Wednesday
..(n|es|dnes)day // => Monday Tuesday Wednesday

5. Built-in Regular Expression Patterns

\w match any word character ( alphanumeric plus “_” ). In some languages, this letter abbreviations are not recognized. Use character classes [A-z0-9_], \W matches any non-word character (everything but alphanumeric plus “_” ). It is equivalent to [^A-z0-9_]

\s matches white space characters: space, newline, and tab. \S matches any non-whitespace character.

\d matches any digit and \D anything else. Use [0-9] if your programming language does not support this abbreviation.

\b matches a word boundary. A word boundary \b is defined as a spot between two characters that have \W on one side of it and \W on the other side of it (in either order).\B matches a no word boundary. A word boundary \b is defined as a spot between two characters that \w on one side of it and \W on the other side of it (in either order).

\b.     // => HELLO Regular expressssion
\B. // => HELLO Regular expressssion
.\b // => HELLO Regular expressssion
.\B // => HELLO Regular expressssion

\A matches the beginning of the string. It is similar to ^ , but ^ will match after each newline if multiline strings are considered. Similarly, \Z matches only at the end of the string or before the newline at the end of it. It is similar to $ , but $ will match before each newline.

\A..     // => HELLO Regular expressssion
\Z.. // => HELLO Regular expressssion

6. Assertion Patterns

(?=<pattern>) will look ahead if the pattern exists, but will not include it in the hit.

\w+(?=O)   // => HELLO Regular expressssion
\w+ // => HELLO Regular expressssion
\w+(?=\w) // => HELLO Regular expressssion

(?!<pattern>) will look ahead if the pattern exists. If it does there will be no-hit.

AAA(?!X)   // => AAAX---AAA

Regular expressions are a powerful language and thus difficult to understand. However, there are a variety of tools to help protect against the challenges of these regular expressions, which you can find the links below. Next time I will write about the second series of Regular expressions in JavaScript.

Thank you for reading my blog, I welcome thoughts, comments, and counterpoints to help me learn, evolve, and grow. Please let me know what you think!

--

--