01 //Introduction
Regex Denial of Service (ReDoS) is a server-side availability bug where a single crafted input forces a regular-expression engine into exponential or polynomial work, pinning a CPU core and blocking the event loop or thread. The carrier is almost always a pattern with nested quantifiers ((a+)+$), overlapping alternation ((a|ab)+$), or a quantified optional group ((a?){50}a{50}) running on user-controlled input. The bug class was formalised by Crosby and Wallach in 2003, has hit Cloudflare, Stack Overflow, Express, npm packages with tens of millions of weekly downloads, and still ships fresh CVEs every quarter because the vulnerable pattern, an innocent-looking regex over user input, is the most natural way to write a validator.
The bug sits in the gap between the regex an author thinks they wrote and the work the engine actually performs. Most mainstream engines (PCRE, Perl, JavaScript, Python re, Java java.util.regex, .NET Regex, Ruby Onigmo) use a backtracking NFA: when a quantifier fails to match, the engine rewinds and tries other partitionings. For a benign input, this is fast. For an adversarial input against a vulnerable pattern, the engine explores 2^n partitionings before giving up. CWE-1333 (Inefficient Regular Expression Complexity) is the canonical reference; OWASP ReDoS is the introduction.
Modern engines have not switched off backtracking. Go's regexp (RE2-based) and Rust's regex crate are linear-time by construction, but the entire JavaScript, Python, Java, Ruby, .NET, and PCRE ecosystem still ships backtracking engines because backreferences and lookaround are widely used and not expressible in a pure NFA. .NET added Regex.MatchTimeout in 4.5 and a non-backtracking option in .NET 7, Java added java.util.regex improvements in 9 and 11, and V8 added a non-backtracking experimental engine, but none of these are defaults. Every code reviewer still has to read each regex on user input and verify it does not contain the evil shapes, or sandbox it behind RE2 / timeouts / a length cap.
OWASP catalogues the issue on its ReDoS page and the Denial of Service Cheat Sheet. The defensive recipe is identical across languages: do not run untrusted regex on user input; if the pattern is hard-coded, prove it is linear-time (no nested quantifiers, no overlapping alternation, anchored where possible); cap input length before the regex runs; and prefer a linear-time engine (RE2, Hyperscan, Rust's regex crate) for any production path that touches attacker-controlled bytes. This module walks through the catastrophic backtracking mechanism, the canonical evil-regex shapes, real CVEs from production systems, and the per-language fix that removes the bug class properly.
A Node.js endpoint validates email addresses with /^([a-zA-Z0-9])(([\\-.]|[_]+)?([a-zA-Z0-9]+))*(@){1}[a-z0-9]+[.]\\w{2,}$/. An attacker submits 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!'. What happens?
02 //The Attack Flow
Every ReDoS exploit follows the same four-stage flow: the attacker identifies a vulnerable pattern on a user-input path, crafts a small string (often under 100 bytes) that maximises backtracking, sends it to the endpoint, and waits for the worker process to consume CPU until either the request times out, the OOM killer fires, or the autoscaler gives up. With a single-threaded runtime like Node.js, one request blocks every other in-flight request on the same process; with a thread-per-request runtime like classic Java servlets, it blocks one thread but a few hundred concurrent requests blocks the whole pool.
2^n match paths through nested quantifiersReDoS is not a memory bug, an injection bug, or a logic bug. The regex is doing exactly what the engine was asked to do: enumerate every possible match path. With a vulnerable pattern, the number of paths grows exponentially in the length of the input, so a 50-byte string can pin a CPU core for minutes. On single-threaded runtimes (Node.js, single-worker Python ASGI, classic PHP-FPM with one worker), one request takes the whole process down.
The shape of a realistic exploit depends on where the regex sits in the request path. A regex inside a body parser fires before any application code, so the attacker does not need to authenticate. A regex inside a logger fires after the request returns, so the attack is invisible to the request timeout. A regex inside a search-filter compilation fires once per query, so a paid attacker with a free tier can amplify cost. Almost every disclosed ReDoS sits in one of these three positions, body parsing, logging, or filter compilation, because each one is reached by attacker bytes on every request.
Common Sink Surfaces in Real Applications
| Where the regex lives | Typical attacker payload |
|---|---|
| Email / phone / URL validators in form handlers | A 50-byte string matching the prefix but failing the final anchor |
| Express path-to-regexp route compilation | Routes like /:foo(((a+)+)+)b that compile slowly at startup or per-mount |
| Body / cookie / header parsers | Crafted Content-Type, Accept-Language, or Set-Cookie strings |
| Log parsers (Pino, Winston, structlog, Logstash grok) | A log line embedded in a user field that defeats the parser |
| Markdown / BBCode / syntax highlighters | A code block or link that triggers a vulnerable tokeniser regex |
| HTML sanitisers and CSS validators | Crafted style attribute or selector with overlapping alternation |
| WAF / SIEM / SOAR correlation rules | A request body that defeats a rule maintained by a different team |
| User-submitted alerting rules (Prometheus, Datadog, Sentry) | A label matcher regex the user controls directly |
| Auto-link / mention extraction in chat / forum software | A long string of @ signs and periods that thrash the URL detector |
path-to-regexp powers Express, Koa, Next.js Pages router, Vercel's edge routing, and dozens of other routers. It compiles route patterns like /users/:id into regular expressions. In 2024, two CVEs (CVE-2024-45296 and the follow-up CVE-2024-52798) shipped because the compiled regex for certain parameter patterns was exponential in the input length. The fix took two releases: clamp the parameter regex to a non-backtracking shape and then move to a hand-written parser that does not use regex at all. Almost every JavaScript router on the planet picked up the patch within a week.
1// VULNERABLE, the regex runs in the request thread (Node.js single-threaded
2// event loop) and a 60-byte adversarial input pins the CPU for minutes.
3
4const express = require('express');
5const app = express();
6
7// BAD, classic Stack-Overflow email pattern. Nested quantifiers around the
8// local-part create catastrophic backtracking on inputs that almost match.
9const EMAIL = /^([a-zA-Z0-9])(([\-.]|[_]+)?([a-zA-Z0-9]+))*(@){1}[a-z0-9]+[.]\w{2,}$/;
10
11app.post('/signup', express.json(), (req, res) => {
12 // No length cap, no timeout, no linear-time engine.
13 if (!EMAIL.test(req.body.email)) return res.status(400).json({ error: 'bad email' });
14 // ...
15 res.json({ ok: true });
16});
17
18// Attacker payload: "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!"
19// (30 a's followed by a non-matching char). The local-part regex tries
20// to partition the a-string across (([\-.]|[_]+)?([a-zA-Z0-9]+))* in
21// 2^30 different ways before giving up. CPU usage on the process: 100%.
22// Concurrent requests: stuck. Event loop: frozen.The team adds a 5-second request timeout in nginx in front of the Node.js process. A pentester reports the service still goes offline under ReDoS attack. Why?
03 //Payload Anatomy & Evil Regex Shapes
A regular expression becomes a ReDoS sink the moment it has an ambiguous parse over some input prefix, meaning more than one way for the engine to assign characters to quantified groups, combined with a final mismatch that forces the engine to try them all. The canonical shapes are nested quantifiers ((a+)+), overlapping alternation ((a|aa)+, (a|ab)+), and quantified optional groups ((a?){n}a{n}). Each shape produces exponential or polynomial blowup depending on the engine and the input.
- Nested quantifiers:
(a+)+$,(a*)*$,(a|a)*$ - Overlapping alternation:
(a|aa)+$,(a|ab)+$ - Quantified groups with optional content:
(a?)50a50 - Greedy + lazy collisions inside the same group:
(.*?)* - Email-validation patterns copied from Stack Overflow circa 2010
- CSS selectors, URL validators, and HTML tag scrubbers built with one big regex
- Body parsers and Express middleware (path-to-regexp, body-parser)
- Log parsers and Loki / Grafana label matchers
- WAF rules, SIEM correlation rules, and SOAR playbooks
- Markdown renderers and syntax highlighters (Prism, highlight.js)
- Validation libraries (joi, yup, zod refinements with custom regex)
- User-submitted regex (search filters, alerting rules, label selectors)
The shape to grep for is a quantifier inside a group followed by another quantifier on the group itself: (X+)+, (X*)*, (X+)*, or any alternation where two branches can match the same prefix ((a|a)+, (a|ab)+). If the regex runs on user input, treat the file as a finding until someone proves the pattern is linear-time on adversarial input.
Payload Catalogue, What an Attacker Actually Submits
| Vulnerable regex | Attacker input | Behaviour |
|---|---|---|
| /^(a+)+$/ | "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!" | Exponential 2^n: 30 a's plus one non-matching char takes seconds; 35 a's takes minutes |
| /^(a*)*$/ | "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!" | Same exponential blowup; empty matches make the inner quantifier explore all splits |
| /^(a|aa)+$/ | "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!" | Overlapping alternation: each a can match either branch, giving 2^n parses |
| /^(a|ab)+b$/ | "aaaaaaaaaaaaaaaaaaaaaaaaaaab" | Exponential before the trailing b is reached |
| /^.*a.*a.*a.*a.*a.*$/ | "aaaaaaaaaaaaaaaaaaaaaaaaaab" | Polynomial O(n^5); generalises to O(n^k) for k repeated .* patterns |
| /^\s*([a-z]+\s*)+$/i | "a a a a a a a a a a a a a a a a a a a a!" | Greedy whitespace + greedy word causes ambiguous splits at every space |
| /^([a-zA-Z0-9])(([\-.]|[_]+)?([a-zA-Z0-9]+))*@/ | "aaaaaaaaaaaaaaaaaaaaaaaaaaaa!" | The classic email validator; nested quantifiers in the local part |
| /^(\d+)*$/ | "1111111111111111111111111111x" | Numeric variant of (a+)+; identical exponential behaviour |
1// 1) The textbook evil regex. Nested quantifier on the same character class.
2// Input length 30 takes seconds, 35 takes minutes, 40 takes hours.
3const EVIL_1 = /^(a+)+$/;
4
5// 2) Overlapping alternation. Two branches that can match the same prefix.
6// Same exponential profile because each character has two parse choices.
7const EVIL_2 = /^(a|ab)+b$/;
8
9// 3) Polynomial ReDoS, less dramatic but harder to spot in code review.
10// Each .* doubles the parse choices; five .* groups give O(n^5) work.
11const EVIL_3 = /^.*a.*a.*a.*a.*a.*$/;
12
13// Reproduce the blowup in a REPL (do not run on a production process):
14// const start = Date.now();
15// EVIL_1.test('a'.repeat(35) + '!');
16// console.log(`elapsed: ${Date.now() - start} ms`);
17//
18// On a 2025 laptop, n=30 takes ~1s, n=35 takes ~30s, n=40 takes ~15 min.JavaScript regex execution is synchronous and runs on the event loop thread. There is no cross-thread interrupt; setTimeout fires after the current synchronous block completes. The same is true of Python's default re module: the GIL is held throughout the match, signal-based timeouts only work on the main thread on Unix, and there is no portable way to cancel a regex in flight. A "timeout" implemented in user-land is a no-op against ReDoS. The only ways to enforce a timeout are (a) move the regex into a worker thread or subprocess and kill it, (b) switch to a linear-time engine, or (c) cap the input length before the regex runs.
A reviewer says 'we cap user input at 10 KB, so we are safe from ReDoS'. A teammate replies 'the exponential regex /^(a+)+$/ takes minutes on 40 bytes of input'. Who is right and what is the minimal correct rule?
04 //Mitigation Patterns
There are four correct mitigations, in rough order of preference. Use a linear-time regex engine (RE2 via re2, node-re2, or Go's standard regexp) for any pattern that runs on user-controlled input; rewrite the pattern to be unambiguous (no nested quantifiers, no overlapping alternation, anchored where possible); cap input length before the regex runs to the smallest value the feature needs; and run the regex inside a worker thread or subprocess with a hard wall-clock timeout that the runtime can actually enforce. Each one is a few lines of code; the discipline is to apply them at every user-input regex, not just the obvious ones.
Mitigation Options at a Glance
| Strategy | How | When to use |
|---|---|---|
| Switch to a linear-time engine | node-re2 / re2 in JS, google-re2 in Python, Hyperscan in C/C++, Rust regex crate, Go regexp | Always, for any regex that runs on attacker bytes. Linear-time engines do not backtrack and cannot be DoSed. |
| Rewrite the pattern | Remove nested quantifiers, possessive quantifiers where supported, atomic groups, character classes instead of alternation | When you must use the default engine (regex includes backreferences, lookaround, or non-trivial features RE2 lacks). |
| Cap input length | Express body-parser limit, validator.js isLength, or a guard at the controller before the regex runs | Backstop for polynomial patterns. Useless alone against exponential blowup but reduces blast radius. |
| Wall-clock timeout in a sandbox | worker_threads + Atomics.wait, child_process.execFile, .NET Regex.MatchTimeout, Java JDK 9+ patterns | When the regex genuinely needs backtracking features and rewriting is not an option. |
| Static analysis in CI | safe-regex, regexp-tree, recheck, rxxr, ReScue, Cody on regex literals | Always. Catches the most common evil shapes at code-review time before the regex ever ships. |
| Ban user-submitted regex | If users need search filters, expose a structured DSL or compile through RE2; do not pass user regex to /.../ | Anywhere user-supplied patterns reach a regex engine: alerting rules, log filters, search boxes. |
1// lib/safe-regex.js, the only allowed regex constructor for user-input patterns.
2//
3// Strategy: every regex that runs on user input is constructed through RE2,
4// a linear-time engine without backtracking. The default JavaScript engine
5// is still allowed for hard-coded patterns proven safe by safe-regex in CI.
6
7const RE2 = require('re2');
8
9// USE THIS for any regex that touches user input. RE2 refuses to compile
10// patterns it cannot run in linear time, so /(a+)+$/ throws at construction
11// instead of pinning the CPU at request time.
12function safeRegex(source, flags = '') {
13 try {
14 return new RE2(source, flags);
15 } catch (err) {
16 // RE2 throws on patterns it cannot run linearly. That is the correct
17 // outcome: the pattern is rejected at construction time, in CI or at
18 // service startup, instead of blowing up at runtime on adversarial input.
19 throw new Error(`regex rejected by RE2: ${err.message}`);
20 }
21}
22
23// Pre-compile at module load. If a pattern is unsafe, the service fails to
24// start, which surfaces the bug in CI rather than in production.
25const EMAIL_RE = safeRegex('^[^\\s@]+@[^\\s@]+\\.[^\\s@]+$');
26
27function validateEmail(input) {
28 if (typeof input !== 'string') return false;
29 if (input.length > 254) return false; // RFC 5321 cap, applied BEFORE the regex.
30 return EMAIL_RE.test(input);
31}
32
33module.exports = { safeRegex, validateEmail };RE2 trades expressiveness for safety. Patterns that use backreferences ((\w+)\s+\1), lookbehind ((?<=foo)bar), or lookahead (foo(?=bar)) do not compile in RE2. That is usually a good thing: if the regex needs a backreference, it is probably trying to do something a parser would handle better. When you absolutely need lookaround, accept the engine limitation by either (a) splitting the regex into two linear passes that achieve the same effect, or (b) keeping the backtracking engine but running the regex in a sandboxed worker with a wall-clock timeout that the OS can enforce.
1// Before: catastrophic backtracking on near-misses.
2const BAD_EMAIL = /^([a-zA-Z0-9])(([\-.]|[_]+)?([a-zA-Z0-9]+))*(@){1}[a-z0-9]+[.]\w{2,}$/;
3
4// After: anchored, no nested quantifier, no overlapping alternation. Each
5// character has exactly one parse path, so the engine runs in O(n).
6const GOOD_EMAIL = /^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$/;
7
8// Better yet: use a real email parser (e.g., RFC 5321 implementation) for
9// authoritative validation, and reserve regex for cheap structural checks.
10function isStructurallyValidEmail(s) {
11 if (typeof s !== 'string') return false;
12 if (s.length > 254) return false; // RFC cap.
13 if (s.indexOf('@') !== s.lastIndexOf('@')) return false; // Reject multiple @.
14 return GOOD_EMAIL.test(s);
15}
16
17// The two-pass form: split before regex to remove ambiguity.
18// Bad: /^(\w+)+\s+(\w+)+$/ (nested + overlapping)
19// Good:
20function parsePair(s) {
21 const parts = s.trim().split(/\s+/, 3);
22 if (parts.length !== 2) return null;
23 if (!/^\w+$/.test(parts[0]) || !/^\w+$/.test(parts[1])) return null;
24 return { first: parts[0], second: parts[1] };
25}1# safe_regex.py, the only allowed regex constructor for user-input patterns.
2#
3# google-re2 wraps the same RE2 engine used by Go. It refuses to compile
4# patterns that cannot run in linear time, surfacing unsafe regex in CI
5# instead of in production.
6
7import re2 # pip install google-re2
8
9MAX_LEN = 1024
10
11EMAIL_RE = re2.compile(r"^[^\s@]+@[^\s@]+\.[^\s@]+$")
12
13def validate_email(value: str) -> bool:
14 if not isinstance(value, str):
15 return False
16 if len(value) > 254: # RFC 5321 cap BEFORE the regex.
17 return False
18 return EMAIL_RE.fullmatch(value) is not None
19
20# If google-re2 cannot be installed (some platforms), fall back to the
21# stdlib re module but ONLY for patterns audited offline with safe-regex
22# or a similar static analyser, AND with a strict length cap.
23import re
24
25def safe_match(pattern: "re.Pattern[str]", value: str, max_len: int = MAX_LEN) -> bool:
26 if len(value) > max_len:
27 return False
28 return pattern.fullmatch(value) is not None1// SafeRegex.java, runs the regex in a fork-join task with a hard wall-clock
2// timeout. Java does not interrupt running regex by default,
3// but explicit Thread.interrupt() at safe points works on
4// modern JDKs combined with cooperative cancellation.
5//
6// For production, prefer Google's RE2/J library (com.google.re2j:re2j) which
7// is a drop-in replacement that runs in linear time.
8
9import com.google.re2j.Pattern;
10import com.google.re2j.Matcher;
11
12public final class SafeRegex {
13 private static final int MAX_INPUT = 1024;
14 private static final Pattern EMAIL =
15 Pattern.compile("^[^\\s@]+@[^\\s@]+\\.[^\\s@]+$");
16
17 public static boolean validateEmail(String input) {
18 if (input == null) return false;
19 if (input.length() > 254) return false; // RFC cap.
20 if (input.length() > MAX_INPUT) return false; // Defensive cap.
21 Matcher m = EMAIL.matcher(input);
22 return m.matches();
23 }
24}
25
26// RE2/J trades backreferences and lookaround for linear-time guarantees.
27// If you need those features, run java.util.regex inside an ExecutorService
28// task with a future.get(timeout, MILLISECONDS) and Thread.interrupt() on
29// timeout. java.util.regex respects the interrupt at backtracking points
30// on JDK 9+, though the cancellation can lag by milliseconds.1// Go's standard regexp package is built on RE2 and is linear-time by
2// construction. There is no ReDoS in idiomatic Go regex usage. The trade-off
3// is the same as RE2 in other languages: no backreferences, no lookaround.
4
5package validate
6
7import (
8 "regexp"
9)
10
11var emailRe = regexp.MustCompile(`^[^\s@]+@[^\s@]+\.[^\s@]+$`)
12
13func IsEmail(s string) bool {
14 if len(s) > 254 {
15 return false
16 }
17 return emailRe.MatchString(s)
18}
19
20// If you import a third-party regex library (rare but possible, e.g., for
21// PCRE compatibility via github.com/dlclark/regexp2), you re-introduce
22// backtracking and lose the linear-time guarantee. Stay on the standard
23// library unless you have a specific reason to leave it.1// The Rust regex crate is linear-time by construction (NFA + lazy DFA).
2// Like RE2, it does not support backreferences or arbitrary lookaround.
3// For PCRE-compatible patterns the fancy-regex crate is available but
4// re-introduces backtracking and ReDoS risk; do not use it on user input.
5
6use once_cell::sync::Lazy;
7use regex::Regex;
8
9static EMAIL: Lazy<Regex> = Lazy::new(|| {
10 Regex::new(r"^[^\s@]+@[^\s@]+\.[^\s@]+$").expect("valid pattern")
11});
12
13pub fn is_email(s: &str) -> bool {
14 if s.len() > 254 {
15 return false;
16 }
17 EMAIL.is_match(s)
18}
19
20// The crate also exposes RegexBuilder::size_limit and dfa_size_limit, which
21// bound memory rather than time. Set them on every Regex built from user
22// input (e.g., search-filter compilation) to bound compile-time DoS.The pattern that actually works in practice is the same as for prepared statements, HTML sanitisation, and outbound link rendering: one helper in a shared library, plus a lint rule that flags any regex literal that runs on user input outside the helper. The helper is twenty lines of code. The discipline is what closes the bug class, every new validator goes through the helper, and code review enforces that.
A developer says 'we use Go, so we are immune to ReDoS'. Is that correct?
05 //Conclusion
ReDoS is a textbook case of trust transferred from the regex author to the regex engine. The application is correct, the engine is correct, the operating system is correct. The vulnerability lives in the moment a backtracking engine is handed a pattern with ambiguous parses and an input that maximises them. Fix it the same way every adjacent bug class is fixed: a single helper backed by a linear-time engine, applied at every user-input regex, paired with length caps, static analysis in CI, and a runtime posture that contains the worst case.
Use a linear-time engine (RE2, RE2/J, Hyperscan, Go stdlib, Rust regex) for every regex on user input · Eliminate evil-regex shapes: nested quantifiers, overlapping alternation, quantified optional groups · Cap input length before the regex runs (RFC limits, body-parser caps, controller guards) · Ban new RegExp(userInput) outside a vetted user-regex helper · Pin path-to-regexp, marked, minimatch, and other regex-heavy libraries to patched versions · Run safe-regex, recheck, or CodeQL's ReDoS queries in CI on every change · Run risky regex inside a worker-thread or subprocess with a wall-clock timeout the OS can enforce · Pair with body-size limits at the proxy, concurrency caps per endpoint, and worker restart policies.
When you see a new regex in a diff, the first question is not "does it match the right strings", it is "what is the worst-case runtime on adversarial input, and what engine will run it?" If the answer involves a backtracking engine on user-controlled bytes without a linear-time wrapper, every byte the user can submit is potentially a CPU-exhaustion primitive. The fix is one helper away. Pair this module with the Command Injection, SQL Injection, and Rate Limiting guides for the adjacent classes where the bug is "an input parser hands the attacker more work than they paid for."