In Python, regular expressions are a
powerful tool for working with text data. A
regular expression (also called regex or
regexp) is a sequence of characters that
define a search pattern. Regular expressions
can be used to search, replace, and
manipulate text.
Python provides a module called re for
working with regular expressions. The re
module contains functions for compiling
regular expressions, searching for matches
in text, and manipulating text using regular
expressions.
A real-world use case for regular
expressions:
Regular expressions are utilized in the
banking industry to validate the account
number format. By using a search pattern, it
is possible to verify whether a given string
contains the prescribed pattern.
Special Characters:
\ Escapes special characters or creates
special sequences
. Matches any character except for
newline characters
^ Matches the start of a string
$ Matches the end of a string
() Creates a capturing group for
extracting a substring
| Matches either the expression before
or after the pipe character
Character classes/sets:
[] Matches any character inside the
brackets
[^] Matches any character not inside the
brackets
Quantifiers:
* Matches zero or more occurrences of
the preceding character
+ Matches one or more occurrences of
the preceding character
? Matches zero or one occurrence of
the preceding character
{n} Matches exactly n occurrences of the
preceding character
{n,} Matches at least n occurrences of the
preceding character
{n,m} Matches between n and m
occurrences of the preceding
character
Special Sequences:
\d Matches any digit character (0-9).
\D Matches any non-digit character.
\s Matches any whitespace character
(space, tab, newline, etc.).
\S Matches any non-whitespace
character.
\w Matches any alphanumeric character
(a-z, A-Z, 0-9, and _).
\W Matches any non-alphanumeric
character.
\b Matches a word boundary (the
position between a word character
and a non-word character).
\B Matches a non-word boundary.
\A Matches the start of a string.
\Z Matches the end of a string.
\G Matches the end of the previous
match or the start of the string if
there is no previous match.
\n Matches newline
\t Matches tab
\r Matches carriage return characters
Most commonly used module functions
in the re Python library.
re.compile(pattern, flags=0) -Compiles a
regular expression pattern into a regular
expression object.
re.search(pattern, string, flags=0) - Searches
a string for a match to the specified regular
expression pattern and returns the first
match found.
re.match(pattern, string, flags=0) - Attempts
to match the specified regular expression
pattern at the beginning of a string.
re.fullmatch(pattern, string, flags=0) -
Attempts to match the entire string with the
specified regular expression pattern.
re.split(pattern, string, maxsplit=0, flags=0)
- Splits a string into a list of substrings using
a regular expression pattern as the delimiter.
re.findall(pattern, string, flags=0) - Finds all
non-overlapping matches of a regular
expression pattern in a string and returns
them as a list of strings.
re.finditer(pattern, string, flags=0) - Finds
all non-overlapping matches of a regular
expression pattern in a string and returns
them as an iterator of match objects.
re.sub(pattern, repl, string, count=0,
flags=0) - Substitutes all occurrences of a
regular expression pattern in a string with a
replacement string.
re.subn(pattern, repl, string, count=0,
flags=0) - Substitutes all occurrences of a
regular expression pattern in a string with a
replacement string and returns a tuple
containing the new string and the number of
substitutions made.
Functions of a regular expression
object in Python.
search(string[, pos[, endpos]]) - Scan
through the string looking for a match to the
pattern, returning a match object, or None if
no match was found.
match(string[, pos[, endpos]]) - Determine
if the RE matches at the beginning of the
string, returning a match object, or None if
no match was found.
fullmatch(string[, pos[, endpos]]) - Match
the entire string to the pattern, returning a
match object, or None if no match was
found.
split(string[, maxsplit]) - Split the string by
the occurrences of the pattern.
findall(string[, pos[, endpos]]): Find all non-
overlapping matches of the pattern in the
string and return them as a list.
finditer(string[, pos[, endpos]]) - Find all
non-overlapping matches of the pattern in
the string and return them as an iterator.
sub(repl, string[, count]) - Return a new
string with all occurrences of the pattern
replaced by the replacement string.
subn(repl, string[, count]) - Perform the
same operation as sub(), but also return the
number of substitutions made.
Examples:
\d{3}-\d{2}-\d{4}
Matches a social security number in the
format of ###-##-####.
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[ a-zA-Z]{2,}
Matches an email address.
\b[A-Z][a-z]*\b
Matches a word starting with a capital letter
(e.g. "John" or "New York").
^(?=.*[A-Z])(?=.*[0-9])(?=.*[!@#$&*]).{8,}$
Matches a password that contains at least
one uppercase letter, one number, one
special character, and is at least 8 characters
long.
^[a-zA-Z0-9_.-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$
Matches a more specific email address.
Python Code Example:
import re
# Define a regular expression pattern
to match email addresses
email_pattern = r'\b[A-Za-z0-9._%+-
]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
# Create a sample string that contains
email addresses
sample_text = "John's email is
john@example.com and Jane's email is
jane123@example.org"
# Use the `findall` function of the
`re` module to find all email addresses
in the sample text
email_addresses =
re.findall(email_pattern, sample_text)
# Print the email addresses that were
found
print(email_addresses)
In this example, we define a regular
expression pattern to match email
addresses using the r prefix to denote a raw
string. We then create a sample text that
contains email addresses. We use the
findall function of the re module to find all
email addresses in the sample text that
match the pattern we defined. Finally, we
print the email addresses that were found.