HTTP Parameter Pollution Code Review Guide
Table of Contents
Introduction
HTTP Parameter Pollution (HPP) is a class of vulnerability that arises when a single HTTP message contains the same parameter name more than once and the components in the request path disagree on which value — or which combination of values — to use. The HTTP specs say almost nothing about this case. RFC 3986 defines URL syntax but leaves query-string semantics "application-specific." The HTML form encoding spec and the URL Living Standard hint at array semantics but stop short of mandating one. Every web framework, every reverse proxy, and every WAF therefore makes its own choice, and the attacker picks the gap.
HPP is the cousin of HTTP request smuggling and the cousin of SAML signature wrapping: the same byte stream, two parsers, divergent outputs. The difference is that HPP needs no protocol-level trickery at all — it lives entirely inside RFC-compliant HTTP and inside parameter syntax that every framework documents. That is what makes it both common and easy to miss in review: nothing about the request looks malformed, the test suite passes against any one component in isolation, and the bug only shows up when you trace the value through the trust chain.
Duplicate parameters are not malformed
Many engineers' first instinct is "just reject duplicate keys." That instinct is correct, but only if you implement it at the right layer. The HTML <form> spec explicitly produces multi-valued submissions for things like checkbox groups, and many APIs accept tag=a&tag=b&tag=c for legitimate list semantics. The fix is not to ban duplicates everywhere, but to declare one parser authoritative and reject anything ambiguous to it.
One Query String, Two Verdicts
GET /transfer?amount=10&amount=10000&to=alice HTTP/1.1
Host: bank.exampleamount = 10- Within daily limit — request approved
- Audit log records a $10 transfer
amount = 10000- Debits the account for the larger value — $10,000 leaves the account
- Audit log still says $10
The pattern: a single duplicate-key query string is parsed by two components in the same request path. Whichever picks the first value makes the security decision; whichever picks the last value performs the action. The attacker lives in the gap.
The diagram above is the entire bug class in one picture. Replace "validator" with "WAF," "rate limiter," "feature-flag service," or "audit logger." Replace "backend ORM" with "authorization handler," "payment processor," or "cache key builder." The structural invariant — one query string, two components, divergent multisets of parameters — is what matters.
- Authorization bypass — the auth layer reads
role=user, the application readsrole=admin. - Audit-log spoofing — the logger captures one value while the operation runs against another.
- WAF / SQLi bypass — the WAF parses the "safe" first parameter, the database query is built from the smuggled second one.
- Mass assignment — strong-parameter filters key on first occurrence, the ORM merges by last occurrence (or vice versa).
- Cache poisoning — the cache key uses one normalisation, the backend uses another, so attacker-controlled responses are served to other users.
- Rate-limit evasion — the rate limiter buckets on the first occurrence of an identifier, the action runs against the smuggled second occurrence with a different identity.
- Open redirect / SSRF amplification — URL-construction code that joins the first
urlparameter with another component routed by the last.
A team reviews a finding: a request with `?role=user&role=admin` lets a low-privileged user reach an admin endpoint. The proposed fix is to add a WAF rule that blocks the literal string `role=admin`. What is the correct framing?
Framework Behavior
The first thing every reviewer should commit to memory is what the most common frameworks do when they see ?a=1&a=2. There is no "right" answer — every choice listed below is reasonable in some context — but mixing two of them in the same request path is almost always a bug.
Same Input ?role=user&role=admin, Different Verdicts
req.query.role → array ['user','admin']request.args.get → first 'user'getlist → ['user','admin']request.GET['role'] → last 'admin'$_GET['role'] → last 'admin'(only
role[] is array)Request.Query["role"] → comma-joined 'user,admin'r.URL.Query().Get → first 'user'@RequestParam String → comma-joined 'user,admin'params[:role] → last 'admin'(
role[]=... for arrays)getParameter → first 'user'getParameterValues → arrayDefault Parameter-Pollution Behavior by Stack
| Stack | Single-key access | Array access |
|---|---|---|
| Node.js (Express + qs) | Returns an <strong>array</strong> as soon as a key is duplicated — same code path that returned a string yesterday now returns <code>string[]</code>. | <code>req.query.role</code> is already <code>['user','admin']</code> |
| Node.js (Fastify, Koa default qs) | Same as Express — coerces to array on duplicate. | <code>req.query.role</code> |
| Python (Flask / Werkzeug) | <code>request.args.get('role')</code> returns the <strong>first</strong> value. | <code>request.args.getlist('role')</code> |
| Python (Django) | <code>request.GET['role']</code> returns the <strong>last</strong> value. | <code>request.GET.getlist('role')</code> |
| PHP | <code>$_GET['role']</code> returns the <strong>last</strong> unless the key uses array syntax (<code>role[]</code>). | <code>$_GET['role'][]</code> only if you used <code>?role[]=...</code> |
| Java (Tomcat / Servlet) | <code>request.getParameter('role')</code> returns the <strong>first</strong>. | <code>request.getParameterValues('role')</code> |
| Java (Spring MVC <code>@RequestParam String</code>) | Comma-joins all values: <code>"user,admin"</code>. Subtle — the value <em>contains a comma</em>. | <code>@RequestParam List<String> role</code> |
| .NET (ASP.NET Core <code>Request.Query</code>) | Returns <code>StringValues</code> — comma-joined when implicitly converted to <code>string</code>. | Iterate <code>StringValues</code> directly |
| Go (<code>net/http</code>) | <code>r.URL.Query().Get('role')</code> returns the <strong>first</strong> value. | <code>r.URL.Query()['role']</code> |
| Ruby on Rails | <code>params[:role]</code> returns the <strong>last</strong>. | <code>?role[]=...&role[]=...</code> |
| Perl (CGI) | <code>param('role')</code> returns the <strong>first</strong>; in list context returns all values. | <code>param('role')</code> in list context |
| Python (FastAPI / Pydantic) | Defaults to <strong>last</strong> unless field is typed as <code>list[T]</code>. | Type the parameter <code>list[T]</code> |
The mixed-fleet trap
A modern app rarely runs on one stack. The CDN parses the URL one way, the WAF parses it another, the API gateway parses it a third, and the microservice behind it parses it a fourth. Each component is internally consistent. Your trust chain is only as safe as the first place a security-relevant value is read: if any downstream component re-parses the raw URL or body and gets a different answer, you have an HPP bug. The fix is to parse once at the edge and propagate the structured result.
The same input, four answers, in 30 lines
1// Demonstration script: feed the same query string to four parsers that
2// commonly coexist in a real Node + nginx + Spring + Python pipeline.
3
4const url = "/api/transfer?amount=10&amount=10000&to=alice";
5const qs = new URL("http://x" + url).searchParams;
6
7console.log("WHATWG URLSearchParams.get ->", qs.get("amount"));
8// "10" (first value)
9
10console.log("WHATWG URLSearchParams.getAll ->", qs.getAll("amount"));
11// ["10","10000"]
12
13const querystring = require("querystring");
14console.log("Node querystring.parse ->",
15 querystring.parse(url.split("?")[1]).amount);
16// ["10","10000"] (array)
17
18const qsLib = require("qs");
19console.log("qs.parse (Express default) ->",
20 qsLib.parse(url.split("?")[1]).amount);
21// ["10","10000"]
22
23// In Python: dict(parse_qsl(...))['amount'] -> "10000" (last)
24// In Java/Spring with String binding: -> "10,10000" (joined)
25// In .NET StringValues -> string: -> "10,10000"
26// In Go r.URL.Query().Get: -> "10"
27
28// Three of the eight outputs are scalars, two of them carry the
29// "wrong" value relative to the others, one is a comma-joined string,
30// and one is an array. Any two of these in the same chain = HPP.A reviewer is auditing a Node + Spring service. The Node front-end accepts `?role=user&role=admin` and forwards `req.query.role` (an array) into a Spring back-end as a query string. Spring binds it to `@RequestParam String role`. What does Spring see, and why does it matter?
Validation Bypass Patterns
Once you know that two parsers can pick different values, the exploit shape writes itself: route the validation through the parser that picks the safe value, route the action through the parser that picks the unsafe one. Below are the patterns that appear most often in real review.
Pattern 1: validator and ORM disagree on first/last
1# Flask + SQLAlchemy. Both internally consistent, but the route handler
2# reads the FIRST occurrence and the ORM helper reads the LAST.
3
4from flask import Flask, request, abort
5from sqlalchemy.orm import Session
6
7app = Flask(__name__)
8
9@app.post("/transfer")
10def transfer():
11 # Validator path: Werkzeug ImmutableMultiDict.get -> first value.
12 amount = request.args.get("amount", type=int)
13 if amount is None or amount > current_user.daily_limit:
14 abort(403)
15
16 # Action path: a helper that takes **kwargs from the raw query string,
17 # which it builds with parse_qs(...). dict(parse_qs(...)) takes LAST.
18 from urllib.parse import parse_qs
19 kwargs = {k: v[-1] for k, v in parse_qs(request.query_string.decode()).items()}
20 do_transfer(**kwargs) # uses amount=10000
21
22# Attacker request: GET /transfer?amount=10&amount=10000
23# request.args.get("amount") -> 10 (passes the limit check)
24# parse_qs(...)['amount'][-1] -> '10000' (the actual transfer amount)Pattern 2: framework auto-arrays surprise the validator
1// Express + a permissive validator that accepts strings and arrays.
2
3app.post("/api/coupon", (req, res) => {
4 // Validator: a regex on the string form.
5 const code = req.body.code;
6 if (typeof code === "string" && /^[A-Z0-9]{8}$/.test(code)) {
7 return applyCoupon(code);
8 }
9 if (Array.isArray(code) && code.every(c => /^[A-Z0-9]{8}$/.test(c))) {
10 return applyCoupon(code[0]);
11 }
12 return res.status(400).end();
13});
14
15// Looks fine. The bug:
16// POST body: code=ABCDEFGH&code=<script>alert(1)</script>
17//
18// req.body.code = ["ABCDEFGH", "<script>...</script>"] (Express + qs)
19// Array.isArray check passes, .every() checks the FIRST regex...
20// wait no — .every checks all, so this is actually safe.
21//
22// The real bug appears when the validator is OR'd:
23// if (typeof code === "string") validateString(code);
24// else applyCoupon(code); // <-- accepts any array!
25//
26// or when the validator is on the FIRST element only:
27// const first = Array.isArray(code) ? code[0] : code;
28// if (regex.test(first)) applyCoupon(code); // applies the WHOLE array.
29//
30// Both shapes are alarmingly common in real review. Fix: bind code as a
31// scalar with a strict schema (Zod/Yup/express-validator) and reject any
32// request where it arrives as an array.Pattern 3: cache key vs origin disagree
1# Varnish/Cloudflare cache key uses the URL "as-seen-by-the-cache."
2# The cache normalises ?a=1&a=2 to one canonical form.
3# The origin parses ?a=1&a=2 and may pick first or last.
4
5# If the cache treats "?role=user" and "?role=user&role=admin" as the
6# SAME cache key (because role's first value is "user" for both), then
7# an attacker primes the cache with the polluted variant, the origin
8# returns admin-content, and every subsequent request to "?role=user"
9# is served the admin response.
10
11# Cloudflare's "cache key normalisation" defaults are particularly
12# implicated: by default it sorts query parameters and may collapse
13# duplicates depending on configuration. The defenders' job is to
14# either (a) include the FULL query string in the cache key, byte-for-
15# byte, or (b) reject duplicate parameters at the cache layer.The shape that catches most reviewers
The textbook HPP example uses ?role=user&role=admin — obvious in any diff. The bugs that ship to production hide one level deeper: the duplicate is a sub-parameter inside a JSON body, or a filter in a GraphQL query, or a query-string fragment appended to a redirect URL by an OAuth flow. The pattern is always the same: somewhere, the same name appears twice, and the trust chain disagrees on what that means.
?id=1&id=2— canonical query-string variant.?id=1%26id=2— URL-encoded ampersand smuggles a duplicate past parsers that decode after splitting.?id=1;id=2— semicolon separator (legacy, Tomcat-friendly, ignored by some parsers).?id[]=1&id[]=2— PHP-style array syntax.?id=1&id=2&id[]=3— mixing scalar and array forms.?id=1\nid=2— CRLF in the query string, ignored by most but not all parsers.{"id":1,"id":2}— duplicate JSON keys (last-wins / first-wins differs by library).id=1\r\n\r\nContent-Disposition: form-data; name="id"\r\n\r\n2— multipart with two fields namedid.id=1in body and?id=2in URL — mixing transports (some frameworks merge, with last-wins or first-wins differing).
A reviewer sees this code in a Django project: `user_id = request.GET['user_id']` followed by `Profile.objects.get(id=user_id)`. The team’s Cloudflare in front of Django strips duplicate query parameters, keeping the FIRST. Is HPP exploitable?
Attack Surface
HPP is not a single bug but a family of bugs that arise wherever the same name can appear twice in a request. The four classic transports are query strings, URL-encoded form bodies, JSON bodies, and HTTP headers; modern stacks add multipart bodies, GraphQL variable maps, and structured cookies. Each of them has different normalisation rules in different frameworks, so each is an HPP surface.
HPP Transport Map
| Transport | Where duplicates come from | Typical bug shape |
|---|---|---|
| Query string | Browser-built URLs from form GETs, attacker-crafted links, OAuth redirect handling that appends parameters. | Validator/handler split, WAF bypass, cache poisoning. |
| URL-encoded form body | HTML <code><form></code> with multiple inputs of the same name (checkboxes), attacker-crafted POST bodies. | Mass assignment, role/permission smuggling. |
| JSON body | Attacker-crafted bodies; some clients/proxies emit duplicate keys when merging. | Last-wins / first-wins disagreements between signer and consumer (JWT alg confusion is a special case). |
| Multipart body | HTML <code><form enctype="multipart"></code> with multiple fields of the same name; file-upload forms with both filename and content fields. | File-content/filename smuggling, role smuggling, validator/storage disagreement. |
| HTTP headers | Most clients deduplicate, but proxies sometimes append. <code>X-Forwarded-For</code> chains are duplicates by design. | IP allowlist bypass, source-of-truth confusion, header injection escalation. |
| Cookies | Multiple <code>Cookie:</code> headers or repeated names within one header; proxies that merge cookie jars. | Session fixation, CSRF-token confusion. |
| GraphQL variables | Operation document and variables map both define a value; some clients send duplicate JSON keys in variables. | Authorization smuggling at the resolver layer. |
| Path parameters / matrix params | <code>/users/123;role=admin</code> — legacy syntax accepted by Tomcat / JAX-RS, ignored elsewhere. | Routing confusion, auth bypass, traversal-adjacent. |
Header HPP: X-Forwarded-For trust
1# A common pattern: trust X-Forwarded-For for client IP, allowlist by IP.
2
3def get_client_ip(request):
4 # Naive: read the first hop, trusting that the proxy added itself last.
5 forwarded = request.headers.get("X-Forwarded-For", "")
6 return forwarded.split(",")[0].strip()
7
8# Attacker sends:
9# X-Forwarded-For: 8.8.8.8, 10.0.0.1
10#
11# Naive parser returns 8.8.8.8.
12# Real proxy chain: 10.0.0.1 was the actual client. The attacker forged the
13# leading hop because get_client_ip read the LEFTMOST entry instead of the
14# rightmost-trusted one.
15#
16# Even worse: many frameworks see DUPLICATE X-Forwarded-For headers
17# (one from the attacker's request, one added by your proxy) as a single
18# comma-joined string. .split(',')[0] then returns whatever the attacker put
19# in the header they controlled, not what your proxy added.
20
21# Fix:
22# 1. Configure the trusted proxy chain explicitly. Walk from the right.
23# 2. Reject requests with X-Forwarded-For that don't match the expected hop count.
24# 3. Use the framework's "TrustedProxies" / "ProxyFix" middleware.Cookie HPP: two session ids
1# Two Set-Cookie or two cookie key=value pairs in one Cookie header.
2# RFC 6265 says the FIRST should win. Real-world implementations vary.
3
4# Server-set:
5# Set-Cookie: session=valid_session
6# Attacker sets via JS / subdomain takeover / older request:
7# Set-Cookie: session=attacker_session
8
9# The Cookie header sent back may now be:
10# Cookie: session=attacker_session; session=valid_session
11# OR the reverse. Browsers and servers disagree on which one wins.
12
13# In Python:
14# request.cookies['session'] # Werkzeug: returns one (last in dict)
15# In Node:
16# req.headers.cookie # raw string with both
17# req.cookies.session # parsed: depends on parser
18#
19# The bug: the auth layer reads cookies via parsed dict (first/last varies),
20# the rate limiter reads the raw header string and does its own split. Now
21# you have two notions of "the session" coexisting in one request and the
22# attacker can pin one while operating under the other.
23
24# Fix: SameSite=Strict + Secure + Path=/ for the canonical cookie, and
25# reject any request with multiple Cookie keys of the same name at the
26# auth layer.Multipart is the worst transport for HPP
Multipart bodies have three places where duplicate names appear: in name= on a part, in filename=, and in nested form data when the part itself is JSON. Different frameworks fold these differently — Spring exposes them as List<MultipartFile>, Express via multer emits arrays, Django merges into QueryDict, .NET emits a FormCollection. Any code that mixes "the file" with "the metadata about the file" across two of these representations is one rename away from a critical HPP bug.
An OAuth redirect handler builds a redirect URL by appending parameters to a stored URL: `redirect_url + '?state=' + state + '&code=' + code`. The stored URL already contains a `state` parameter. What can an attacker do?
Code-Review Checklist
HPP review is a structural discipline. You are not looking for a single line of code; you are looking for two places that read the same input under different parser policies. The checklist below is the questionnaire to run on every PR that touches request handling, authorization, URL construction, or proxy/CDN configuration.
- A handler reads
request.args.get(...)/req.query.x/params[:x]for security purposes without asserting that the value is a scalar. If duplicates appear, the type may flip from string to array (Express) or the value may comma-join (Spring/.NET) and any equality check silently fails closed while a contains-check fails open. - A request is checked at one layer (validator, decorator, middleware) and acted on at another using a different accessor:
getvsgetAll,getvsgetlist, scalar binding vsList<String>. - A redirect URL is built by string concatenation:
base_url + "?state=" + state. Almost certainly an HPP smuggling surface. - A cache layer (Cloudflare, Varnish, Fastly, Cloud CDN) sits in front of an origin that also reads query parameters. Verify that the cache key includes the full query string verbatim and that the cache normalisation rules match the origin's parser.
- A WAF rule that inspects parsed parameters runs in front of an origin with a different parser. Either reject duplicates at the WAF or rely on the origin instead of the WAF for the security check.
- A JSON body is signed in one service and re-parsed in another. If either parser tolerates duplicate keys, the signature does not bind what the consumer reads. Canonicalise (JCS, RFC 8785) before signing.
- A multipart handler that exposes form fields and file fields as separate APIs. Mass-assignment / file-metadata smuggling lives here.
- Header reads via
headers.get('X-Foo')for security purposes — if the header can appear twice, the framework may concatenate or pick first/last differently than your proxy did. parse_qs,querystring.parse,URLSearchParams, or any string-based query-string parsing inside business logic. The request already came with a parsed structure — reusing the raw bytes risks reaching a different multiset.- Strong-parameter / mass-assignment filters that allowlist by name. Make sure the allowlist runs against the same parsed object the ORM merges from, not a re-parse.
Greppable smells
1# Concatenated redirect URLs.
2rg -nE 'redirect.*\+.*=' --type js --type ts --type py --type rb
3rg -nE 'res\.redirect\([^)]*\+' --type js --type ts
4
5# Re-parsing the raw query string inside business logic.
6rg -nE 'parse_qs\(|parseQuery\(|URLSearchParams\(' --type py --type js --type ts
7
8# Header reads without explicit dedup.
9rg -nE "request\.headers\.get\(['\x22]X-" --type py --type js --type ts
10
11# Spring @RequestParam bound to String for a parameter that "could be" a list.
12rg -n '@RequestParam.*String\s+\w+' --type java
13
14# Express handlers that index req.query without type-checking.
15rg -nE 'req\.query\.[a-zA-Z_]+' --type js --type ts | rg -v 'typeof'
16
17# Django/Flask code that uses .get(...) and never .getlist(...).
18rg -n 'request\.GET\.get\(|request\.args\.get\(' --type py | rg -l -v 'getlist'
19
20# Mass-assignment patterns that merge raw params into models.
21rg -nE 'Object\.assign\(.*req\.body|spread.*body\)' --type js --type ts
22rg -n 'permit\(\*' --type rbThe one-question review
"Does the layer that makes the security decision read this parameter from the same parsed structure as the layer that performs the action, with the same first/last/array policy?" If the answer is anything other than "yes, demonstrably," you have an HPP surface. Like its parser-differential cousin, this question catches almost every real-world bug in this class.
Reference: a parameter middleware that fails closed
1// Express middleware. Mount this BEFORE any route handler. Reject any
2// request that contains a duplicate parameter unless the route explicitly
3// opts in by listing the parameter name in an allowlist.
4
5function strictSingleParams(arrayAllowlist = []) {
6 const allowed = new Set(arrayAllowlist);
7 return function (req, res, next) {
8 const violations = [];
9 for (const [k, v] of Object.entries(req.query)) {
10 if (Array.isArray(v) && !allowed.has(k)) violations.push(`query.${k}`);
11 }
12 if (req.body && typeof req.body === "object") {
13 for (const [k, v] of Object.entries(req.body)) {
14 if (Array.isArray(v) && !allowed.has(k)) violations.push(`body.${k}`);
15 }
16 }
17 if (violations.length) {
18 return res.status(400).json({
19 error: "duplicate parameters not allowed",
20 offending: violations,
21 });
22 }
23 next();
24 };
25}
26
27// Usage:
28// app.use(strictSingleParams(["tags", "ids"])); // explicit array params
29// // every other parameter must be a scalar; any duplicate is a 400.
30
31// Why this works: HPP exploits depend on the attacker being able to slip
32// duplicates past a layer that wasn't expecting them. If duplicates are
33// rejected at the edge, downstream layers can safely treat all values as
34// scalars without writing defensive checks at every call site.