Regular expressions have the worst reputation in programming. They look like someone smashed the keyboard: ^(?:[a-zA-Z0-9._%+-]+)@(?:[a-zA-Z0-9.-]+)\.(?:[a-zA-Z]{2,})$. But regex is just a language with a small vocabulary. Learn 10 symbols and you can read any pattern.
The Building Blocks
| Symbol | Meaning | Example | Matches |
|---|---|---|---|
. |
Any character (except newline) | h.t |
hat, hot, hit |
* |
Zero or more of previous | ab*c |
ac, abc, abbc |
+ |
One or more of previous | ab+c |
abc, abbc (not ac) |
? |
Zero or one of previous | colou?r |
color, colour |
^ |
Start of string | ^Hello |
Hello world (not Say Hello) |
$ |
End of string | end$ |
the end (not endless) |
[abc] |
Any one of these characters | [aeiou] |
any vowel |
[^abc] |
Any character NOT in set | [^0-9] |
any non-digit |
(group) |
Capture group | (\d{3}) |
captures 3 digits |
a|b |
Either a or b | cat|dog |
cat or dog |
Character Classes (Shortcuts)
| Shortcut | Equivalent | Meaning |
|---|---|---|
\d |
[0-9] |
Any digit |
\D |
[^0-9] |
Any non-digit |
\w |
[a-zA-Z0-9_] |
Any word character |
\W |
[^a-zA-Z0-9_] |
Any non-word character |
\s |
[ \t\n\r\f] |
Any whitespace |
\S |
[^ \t\n\r\f] |
Any non-whitespace |
\b |
n/a | Word boundary |
Quantifiers
# Exact, minimum, range
\d{3} # Exactly 3 digits: 123
\d{2,4} # 2 to 4 digits: 12, 123, 1234
\d{3,} # 3 or more digits: 123, 123456
# Greedy vs Lazy
.* # Greedy: matches as MUCH as possible
.*? # Lazy: matches as LITTLE as possible
# Example:
# Input: "<b>hello</b> world <b>foo</b>"
# <b>.*</b> matches: "<b>hello</b> world <b>foo</b>" (greedy: everything)
# <b>.*?</b> matches: "<b>hello</b>" (lazy: first match)
Reading Regex Left to Right
Every regex can be read as a sentence. Let us decode a real-world pattern:
# Pattern: ^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z$
# Reading left to right:
# ^ Start of string
# \d{4} 4 digits (year)
# - literal dash
# \d{2} 2 digits (month)
# - literal dash
# \d{2} 2 digits (day)
# T literal "T"
# \d{2} 2 digits (hour)
# : literal colon
# \d{2} 2 digits (minute)
# : literal colon
# \d{2} 2 digits (second)
# Z literal "Z" (UTC)
# $ End of string
# Result: ISO 8601 datetime like "2026-04-28T10:30:00Z"
Practical Patterns You Will Actually Use
1. Email Validation (Practical, Not RFC-Perfect)
import re
email_pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}