In this tutorial, I will be taking a brief look into Regex and how they work. My goal is to be able to write and use regular expressions in a way that is easy to understand and use.
Regex (short for regular expression) is a string of text that allows you to create search patterns that match, manage, and locate text.
An example code snippet of regex shows as following:
/^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/
- This is regular expression used to match an e-mail address
Fun Fact Regular expressions can also be used from the command line and within text-editors to find text within a file.
- Anchors
- Quantifiers
- OR Operator
- Character Classes
- Flags
- Grouping and Capturing
- Bracket Expressions
- Greedy and Lazy Match
- Boundaries
- Back-references
- Look-ahead and Look-behind
Anchors are unique in that they match a position within a string, not a character.
Examples of Anchors are as follows:
-
^Beginning - Matches the beginning of the string. -
$End - Matches the end of the string.
Quantifiers indicate that the preceding token must be matched a certain number of times. By default, quantifiers are greedy, and will match as many characters as possible.
Examples of Quantifiers are as follows:
-
*Star - Matches zero or more characters. -
+Plus - Matches one or more characters. -
?Optional - Matches zero or one characters, effectively making it optional. -
{n}Quantifier - Matches exactly n characters.
Known as Alternation... it acts like a boolean OR, matching one sequence or another.
Example of Alternation is as follows:
|Alternation - Acts like a boolean OR operator. Matches the expression before or after the|symbol.
Character classes match a character from a specific set. There are a number of predefined character classes and you can also define your own sets.
Examples of Character Classes are as follows:
-
.Dot - Matches any character except linebreaks. -
\dDigit - Matches a single digit. -
\wWord - Matches a single word character. -
\sWhitespace - Matches a single whitespace character. -
_Character - matches a underscore character. -
\Escaped character - Matches a "." character -
-Character - matches a hyphen character. -
@Character - matches a at symbol character.
Expression flags change how the expression is interpreted. Flags are used to modify the behavior of the expression.
Examples of Flags are as follows:
-
iCase Insensitive - Matches the expression ignoring case. -
mMulti-line - Matches the expression across multiple lines. -
sDot All - Matches the expression across newlines.
Groups allow you to combine a sequence of tokens to operate on them together.
Examples of Groups are as follows:
-
(abc)Capturing Group - Groups multiple tokens together and creates a capture group for extracting. -
(?:)Capturing Groups - Groups multiple characters together without creating a capture group.
Bracket Expressions are characters enclosed by a bracket [] matching any single character within the brackets.
Examples of Bracket Expressions are as follows:
-
[abc]Character Set - Matches a character from the set. -
[^abc]Negated Set - Matches a character that is not in the set. -
[a-z]Range - Matches a character from within the specified range of characters in the set.
By default, quantifiers are greedy, and will match as many characters as possible. This is the behavior that is most commonly used in regular expressions.
However, you can make a quantifier lazy by adding a ? to the end of the quantifier. This will make the quantifier match as few characters as possible.
Examples of Greedy and Lazy Match are as follows:
-
?Lazy - Makes the preceding quantifier lazy, causing it to match as few characters as possible. -
*?Lazy Star -
+?Lazy Plus -
{n}?Lazy Quantifier -
(abc)?Lazy Group -
[abc]?Lazy Character Class
Boundaries are used to match the start or end of a string. They are the places betwen characters kind of like a wall.
-
\bWord Boundary- Matches a word boundary position between a word character and non-word character or position (start / end of string). -
\BNot Word Boundary - Matches any position that is not a word boundary. This matches a position, not a character.
Back-references are used to match a previously matched group.
Examples of Back-references are as follows:
-
\1Numeric Reference - Captures the specified group in this case its the first group. -
(?<name>abc)Name Capturing Group - Creates a capturing group that can be referenced via the specified name.
Lookaround lets you match a group before (lookbehind) or after (lookahead) your main pattern without including it in the result. In addition, you can specify a positive or negative lookaround. Negative lookarounds specify a group that can NOT match before or after the pattern.
Examples of Look-ahead and Look-behind are as follows:
-
(?=abc)Positive Look-ahead - Matches the expression before the?=symbol. -
(?!abc)Negative Look-ahead - Matches the expression before the?!symbol. -
(?<=abc)Positive Look-behind - Matches the expression after the?<=symbol. -
(?<!abc)Negative Look-behind- Matches the expression after the?<!symbol.
My name is Brian Mojica, I am software engineer. I have a passion for creating beautiful, functional, and intuitive user experiences. For more information about me, you can see my latest projects at my GitHub or view my Portfolio.
If you wish to learn more about regex you can find more information provided on the Regex site.
Link to gist-pad repository