A Practical Guide to Regular Expressions

8 min read

Regular expressions are a compact language for describing patterns in text. Once you can read them, they turn fiddly string-handling problems into a few characters of declarative code. The catch is that the syntax is terse and easy to misread, which gives regex a reputation for being write-only. It does not have to be that way.

This guide walks through the building blocks you actually use day to day, with a JavaScript focus but ideas that carry across most languages. We cover the core syntax, quantifiers, groups, flags, a handful of realistic patterns, and the traps worth avoiding, with patterns you can try as you read.

What Regex Is and Where You Use It

A regular expression is a pattern that a regex engine compares against a string. If the pattern describes some part of the string you get a match, and often you also get back which characters matched. In JavaScript a pattern is written between slashes, as in /cat/, or built from a string with new RegExp('cat'). The engine then scans your text for that pattern.

The everyday uses fall into a few buckets. Validation checks whether input has the right shape, such as confirming a field looks like an email address. Search and replace finds every occurrence of a pattern and optionally rewrites it, which powers find-in-files and string.replace. Parsing and extraction pulls structured pieces out of semi-structured text, like grabbing all the prices from a page. And log analysis filters lines that share a shape, which is why tools like grep are built around regex. If you find yourself writing nested loops over character indexes, a regex is often the clearer answer.

Core Building Blocks

The simplest pattern is a literal: /dog/ matches the letters d, o, g in sequence. Most of regex describes characters more loosely, and character classes are the workhorses. The shorthand \d matches any digit, \w a word character (letters, digits, and underscore), and \s whitespace such as spaces, tabs, and newlines. Their uppercase versions \D, \W, and \S mean the opposite.

You can also define your own class with square brackets. The pattern [aeiou] matches any single vowel, and a range like [a-z] matches any lowercase letter. A leading caret negates the class, so [^0-9] matches any non-digit. The dot, written as ., matches any character except a line break by default.

Anchors pin a pattern to a position rather than a character. The caret ^ asserts the start of the string and the dollar sign $ asserts the end, so /^abc$/ matches only the exact string 'abc'. The word boundary \b matches the empty position between a word and a non-word character, letting /\bcat\b/ match 'cat' as a whole word but not inside 'category'.

Quantifiers and Greedy vs Lazy Matching

Quantifiers say how many times the preceding element may repeat. The star * means zero or more, the plus + means one or more, and the question mark ? means zero or one, making it optional. For precise counts, braces give control: {3} means exactly three, {2,5} between two and five, and {2,} two or more. So \d{4} matches a four-digit year and \w+ matches a run of word characters.

By default quantifiers are greedy, meaning they grab as much text as possible and give back only what they must for the rest of the pattern to match. This bites people with delimited content. Against 'a<one>b<two>c', the pattern /<.*>/ matches '<one>b<two>' in one go, because .* consumes everything up to the last >. Appending a ? makes the quantifier lazy, so /<.*?>/ stops at the first > and matches just '<one>'. This single distinction resolves a large share of confusing regex results.

Groups, Alternation, and Backreferences

Parentheses group part of a pattern so a quantifier or alternation applies to the whole thing. They also capture: each pair of parentheses creates a numbered capture group whose matched text you can retrieve afterward. In /(\d{4})-(\d{2})-(\d{2})/ on a date, group 1 holds the year, group 2 the month, and group 3 the day. JavaScript also supports named groups like (?<year>\d{4}), which read better than counting positions.

When you need grouping but not the captured text, use a non-capturing group written (?:...), which keeps group numbers clean. The pipe | is alternation, meaning logical or, so /(jpe?g|png|gif)/ matches any of those image extensions. Finally, a backreference like \1 matches the same text a previous group captured, so /(\w+)\s+\1/ finds a word repeated back to back, such as 'the the'.

Flags That Change the Match

Flags are letters appended after the closing slash that adjust how the whole pattern behaves. The g flag is global: without it most operations stop at the first match, and with it methods like replace and matchAll work across every occurrence. The i flag makes matching case-insensitive, so /hello/i matches 'Hello' and 'HELLO'.

The m flag is multiline, changing ^ and $ to match at the start and end of each line rather than the whole string, which is handy for log files. The s flag, called dotAll, lets the dot also match newlines so a pattern can span lines. Flags combine freely, as in /^error/gim. DevFmt's Regex Tester exposes these as checkboxes so you can toggle them and watch the matches update.

Practical Examples and Common Pitfalls

A few concrete patterns show how the pieces fit together. A rough email-like check is /^[\w.+-]+@[\w-]+\.[\w.-]+$/: name characters, an at sign, a domain, a dot, and a top-level domain. To extract numbers, /\d+/g with matchAll returns every run of digits, and the backreference pattern above with the i flag finds duplicated words case-insensitively.

The most important pitfall is catastrophic backtracking, the cause of ReDoS denial-of-service bugs. Patterns with nested, overlapping quantifiers such as /(a+)+$/ can take exponential time on certain non-matching inputs as the engine tries every way to split the text. Avoid stacking quantifiers on groups that can match the same characters, and prefer specific classes over broad ones like .* where you can.

Beyond performance, lean toward clarity and break complex patterns into smaller steps. Know when not to reach for regex at all: recursive or nested structures like HTML and JSON are not regular, so a real parser is the right tool. And always test against real sample data, including the awkward edge cases, rather than trusting a pattern that merely looks right. Because DevFmt's Regex Tester runs entirely in your browser's JavaScript engine and never sends your input anywhere, you can safely paste production logs or user data while you refine a pattern.

Try the tools

We use cookies for anonymous analytics and ads. Your tool data never leaves your browser.