XPath Injection: Code Review Guide
Table of Contents
1. Introduction to XPath Injection
XPath injection occurs when user input is embedded directly into an XPath query without sanitization. XPath is a query language for selecting nodes from XML documents — analogous to SQL for relational databases. When applications use XML as a data store or process XML documents with user-controlled query parameters, attackers can manipulate the XPath expression to bypass authentication, extract sensitive data, or enumerate the entire XML document structure.
Why XPath Injection Matters
Unlike SQL injection, XPath injection has no permission model. There are no tables, users, or access controls in XPath — a successful injection always returns data from the entire XML document. There is no equivalent of SQL's GRANT/REVOKE. Additionally, XPath 1.0 has no equivalent of parameterized queries in most implementations, making prevention patterns less standardized than SQL injection. XPath injection appears in OWASP's injection category and is particularly common in legacy SOAP services, SAML implementations, and applications using XML-based configuration.
In this guide, you'll learn how to spot XPath injection patterns during code review, understand why XML-based authentication is particularly vulnerable, see how attackers exploit both in-band and blind XPath injection, review language-specific vulnerable patterns in Java, .NET, Python, PHP, and Node.js, and implement effective prevention strategies including parameterized XPath queries.
XPath Injection Attack Flow
How XPath Injection Works
Where XPath Injection Appears
A developer argues that XPath injection is not a concern because 'we only use XML for configuration, not as a database.' Why is this reasoning flawed?
2. Vulnerable Code Patterns
XPath injection occurs whenever user input is concatenated into an XPath expression string. The most critical pattern is XML-based authentication, where username and password are embedded directly into an XPath query to look up users.
Java: Classic XPath injection in authentication
1// ❌ VULNERABLE: User credentials embedded in XPath query
2import javax.xml.xpath.*;
3import org.w3c.dom.Document;
4
5public boolean authenticate(String username, String password) {
6 XPathFactory factory = XPathFactory.newInstance();
7 XPath xpath = factory.newXPath();
8
9 // User input directly concatenated into query
10 String expression = "//users/user[name='" + username
11 + "' and password='" + password + "']";
12
13 NodeList result = (NodeList) xpath.evaluate(
14 expression, document, XPathConstants.NODESET
15 );
16
17 return result.getLength() > 0;
18 // Attacker input: username = ' or '1'='1' or '1'='1
19 // Resulting query: //users/user[name='' or '1'='1' or '1'='1'
20 // and password='anything']
21 // Returns ALL user nodes → authentication bypassed
22}C#/.NET: Vulnerable XPath lookup
1// ❌ VULNERABLE: .NET XPath with string concatenation
2using System.Xml;
3
4public XmlNode FindUser(string username, string password) {
5 XmlDocument doc = new XmlDocument();
6 doc.Load("users.xml");
7
8 // Direct interpolation of user input
9 string query = $"//user[username='{username}' and password='{password}']";
10 XmlNode user = doc.SelectSingleNode(query);
11 return user;
12}
13
14// ❌ VULNERABLE: XPathNavigator with user input
15public bool ValidateCredentials(string user, string pass) {
16 XPathDocument doc = new XPathDocument("users.xml");
17 XPathNavigator nav = doc.CreateNavigator();
18
19 string xpath = "//accounts/account[user='" + user
20 + "' and pass='" + pass + "']";
21 XPathNodeIterator iter = nav.Select(xpath);
22 return iter.Count > 0;
23}
24
25// ❌ VULNERABLE: LINQ to XML with XPath
26public XElement GetConfig(string key) {
27 XDocument doc = XDocument.Load("config.xml");
28 // User controls 'key' parameter
29 return doc.XPathSelectElement($"//settings/setting[@name='{key}']");
30}Python: Vulnerable lxml and ElementTree patterns
1# ❌ VULNERABLE: lxml XPath with string formatting
2from lxml import etree
3
4def find_user(username, password):
5 tree = etree.parse("users.xml")
6 # f-string directly embeds user input
7 query = f"//user[name='{username}' and pass='{password}']"
8 results = tree.xpath(query)
9 return len(results) > 0
10
11# ❌ VULNERABLE: ElementTree with .find() / .findall()
12import xml.etree.ElementTree as ET
13
14def get_product(product_id):
15 tree = ET.parse("products.xml")
16 root = tree.getroot()
17 # User controls product_id
18 product = root.find(f".//product[@id='{product_id}']")
19 return product
20
21# ❌ VULNERABLE: String concatenation with % formatting
22def search_xml(search_term):
23 tree = etree.parse("data.xml")
24 query = "//item[contains(name, '%s')]" % search_term
25 return tree.xpath(query)PHP: Vulnerable DOMXPath patterns
1<?php
2// ❌ VULNERABLE: PHP DOMXPath with user input
3$doc = new DOMDocument();
4$doc->load('users.xml');
5$xpath = new DOMXPath($doc);
6
7$username = $_POST['username'];
8$password = $_POST['password'];
9
10// Direct variable interpolation
11$query = "//user[login='$username' and password='$password']";
12$entries = $xpath->query($query);
13
14if ($entries->length > 0) {
15 // Authentication bypassed with: ' or '1'='1
16 echo "Login successful";
17}
18
19// ❌ VULNERABLE: SimpleXML with xpath()
20$xml = simplexml_load_file('products.xml');
21$search = $_GET['q'];
22$results = $xml->xpath("//product[contains(name, '$search')]");
23?>Common XPath Injection Entry Points
| Entry Point | Context | Risk Level |
|---|---|---|
| Login forms | XML-based user authentication | Critical |
| Search functionality | XPath queries over XML content | High |
| SOAP service parameters | XML message processing | High |
| Configuration lookups | Dynamic config key retrieval | High |
| SAML assertion processing | Attribute queries on SAML XML | Critical |
| REST API filters | XML data store queries | High |
| Content management | XML-based CMS content retrieval | Medium |
| Report generation | XML data source queries | Medium |
Given this XPath query: //users/user[name='$username' and password='$password'], what input for username bypasses authentication WITHOUT needing a valid password?
3. Exploitation Techniques
XPath injection can be exploited in multiple ways depending on whether the attacker can see query results directly (in-band) or must infer data through boolean conditions (blind). Unlike SQL, XPath provides built-in functions that make data extraction straightforward once injection is confirmed.
Authentication bypass payloads
1# Classic OR-based bypass (username field)
2' or '1'='1
3' or '1'='1' or '1'='1
4' or 1=1 or '
5admin' or '1'='1
6
7# Bypass with comment-like techniques (XPath has no comments,
8# but you can terminate the expression)
9' or true() or '
10
11# Extract first user (useful when app returns user data)
12' or position()=1 or '1'='1
13
14# Bypass when single-quotes are filtered
15" or "1"="1
16
17# Target specific user
18' or name='admin' or '1'='1Data extraction via UNION-style XPath
1# XPath doesn't have UNION, but the | operator merges nodesets.
2# If you can control the full expression or break out of one:
3
4# Extract all users (when injected into a user lookup)
5'] | //user | //user['
6
7# Resulting query:
8# //users/user[name=''] | //user | //user['']
9# The | operator returns the union of all three nodesets
10
11# Extract passwords specifically
12'] | //user/password | //x['
13
14# Navigate to parent/sibling nodes
15']/parent::*/child::* | //x['
16
17# Extract the root element and all descendants
18'] | /* | //x['XPath functions useful for injection
1# String functions for data extraction
2string() - Convert node to string
3string-length() - Get string length (useful for blind extraction)
4substring() - Extract characters: substring(string, pos, len)
5concat() - Concatenate strings
6contains() - Check if string contains substring
7starts-with() - Check string prefix
8normalize-space() - Normalize whitespace
9
10# Node navigation
11count() - Count nodes: count(//user) reveals user count
12name() - Get element name: name(/*) reveals root element
13position() - Current node position in set
14last() - Last position in set
15
16# Boolean functions
17true() - Always true (injection helper)
18false() - Always false
19not() - Negate condition
20
21# Example: Extract root element name
22' or name(/*)='a' or '1'='2
23# Test each letter until the app returns true → leak root element name
24
25# Example: Count users
26' or count(//user)=5 or '1'='2
27# Returns true only if there are exactly 5 usersNo Access Controls in XPath
A critical difference between XPath and SQL injection: XPath has no concept of permissions or access controls. In SQL, an injected query runs with the database user's privileges — a properly configured DB user might only have SELECT access to certain tables. In XPath, once you can inject, you can query every node in the entire XML document. There are no schemas, no privileges, no row-level security. Functions like count(), string(), and substring() give the attacker tools to fully enumerate and extract the document.
An attacker has confirmed XPath injection in a search field but the application only displays 'Results found' or 'No results.' What technique allows data extraction?
4. Language-Specific Patterns
Each language and XML library has different XPath APIs with varying levels of support for parameterized queries. Understanding the specific patterns in your tech stack is essential for effective code review.
Java: Vulnerable vs. secure XPath
1// ❌ VULNERABLE: String concatenation in XPath
2XPath xpath = XPathFactory.newInstance().newXPath();
3String expr = "//user[name='" + userInput + "']";
4NodeList nodes = (NodeList) xpath.evaluate(expr, doc, XPathConstants.NODESET);
5
6// ✅ SECURE: Using XPathVariableResolver for parameterized queries
7XPath xpath = XPathFactory.newInstance().newXPath();
8
9xpath.setXPathVariableResolver(variableName -> {
10 if ("username".equals(variableName.getLocalPart())) {
11 return userInput;
12 }
13 if ("password".equals(variableName.getLocalPart())) {
14 return passInput;
15 }
16 return null;
17});
18
19// $username is treated as a string literal, not XPath syntax
20String expr = "//user[name=$username and password=$password]";
21NodeList nodes = (NodeList) xpath.evaluate(expr, doc, XPathConstants.NODESET);
22
23// ✅ SECURE: Input validation + escaping as defense-in-depth
24public static String escapeXPathValue(String input) {
25 if (!input.contains("'")) {
26 return "'" + input + "'";
27 }
28 if (!input.contains("\"")) {
29 return "\"" + input + "\"";
30 }
31 // Contains both quotes — use concat()
32 StringBuilder sb = new StringBuilder("concat(");
33 String[] parts = input.split("'");
34 for (int i = 0; i < parts.length; i++) {
35 if (i > 0) sb.append(", "'", ");
36 sb.append("'").append(parts[i]).append("'");
37 }
38 sb.append(")");
39 return sb.toString();
40}Python: lxml parameterized XPath (recommended)
1# ❌ VULNERABLE: String formatting
2from lxml import etree
3
4tree = etree.parse("users.xml")
5query = f"//user[name='{username}']"
6results = tree.xpath(query)
7
8# ✅ SECURE: lxml supports XPath variables natively
9from lxml import etree
10
11tree = etree.parse("users.xml")
12# Variables are passed as keyword arguments — lxml escapes them
13results = tree.xpath(
14 "//user[name=$name and password=$pass]",
15 name=username,
16 pass=password,
17)
18# $name and $pass are treated as string values, not XPath syntax
19# Injection attempt: username = "' or '1'='1"
20# lxml passes it as a literal string: name="' or '1'='1"
21# No nodes match → injection fails
22
23# ✅ SECURE: Using XPath variables with ElementTree (limited XPath)
24import xml.etree.ElementTree as ET
25
26tree = ET.parse("users.xml")
27root = tree.getroot()
28
29# ElementTree doesn't support XPath variables, so validate input
30import re
31
32def safe_find(root, tag, attr, value):
33 # Strict allowlist validation
34 if not re.match(r'^[a-zA-Z0-9_@.\-]+$', value):
35 raise ValueError("Invalid search value")
36 return root.findall(f".//{tag}[@{attr}='{value}']")Node.js: xpath and libxmljs patterns
1// ❌ VULNERABLE: xpath npm package with string concatenation
2const xpath = require('xpath');
3const dom = require('xmldom').DOMParser;
4
5const doc = new dom().parseFromString(xmlString);
6const query = "//user[name='" + userInput + "']";
7const nodes = xpath.select(query, doc);
8
9// ✅ SECURE: Using xpath.useNamespaces with variable escaping
10const xpath = require('xpath');
11
12function escapeXPathString(str) {
13 if (!str.includes("'")) {
14 return "'" + str + "'";
15 }
16 if (!str.includes('"')) {
17 return '"' + str + '"';
18 }
19 return "concat(" + str.split("'").map((part, i) =>
20 (i > 0 ? ","'",": "") + "'" + part + "'"
21 ).join("") + ")";
22}
23
24const safeQuery = "//user[name=" + escapeXPathString(userInput) + "]";
25const nodes = xpath.select(safeQuery, doc);
26
27// ✅ BETTER: Avoid XPath entirely — use DOM methods
28function findUser(doc, username) {
29 const users = doc.getElementsByTagName('user');
30 for (let i = 0; i < users.length; i++) {
31 const nameEl = users[i].getElementsByTagName('name')[0];
32 if (nameEl && nameEl.textContent === username) {
33 return users[i];
34 }
35 }
36 return null;
37}PHP: Secure XPath patterns
1<?php
2// ❌ VULNERABLE: Direct interpolation
3$query = "//user[login='$username' and password='$password']";
4$entries = $xpath->query($query);
5
6// ✅ SECURE: Escape XPath values before embedding
7function escapeXPathValue(string $value): string {
8 if (strpos($value, "'") === false) {
9 return "'" . $value . "'";
10 }
11 if (strpos($value, '"') === false) {
12 return '"' . $value . '"';
13 }
14 // Contains both — use concat()
15 $parts = explode("'", $value);
16 $escaped = "concat(";
17 foreach ($parts as $i => $part) {
18 if ($i > 0) $escaped .= ", "'", ";
19 $escaped .= "'" . $part . "'";
20 }
21 $escaped .= ")";
22 return $escaped;
23}
24
25$safeUser = escapeXPathValue($username);
26$safePass = escapeXPathValue($password);
27$query = "//user[login={$safeUser} and password={$safePass}]";
28$entries = $xpath->query($query);
29
30// ✅ BETTER: Use registerPhpFunctions for custom comparison
31$xpath->registerPhpFunctions(['preg_match']);
32// Or avoid XPath for auth entirely — use a proper database
33?>Parameterized XPath Support by Language
Java: XPathVariableResolver provides native parameterized queries. Python/lxml: Variables passed as keyword arguments to .xpath() are escaped automatically. .NET: XsltArgumentList can pass parameters in XSLT, but SelectSingleNode requires manual escaping. PHP: No built-in parameterization — use the concat() escaping pattern. Node.js: No built-in parameterization in common xpath packages — escape manually or use DOM methods instead.
A Python developer uses lxml and writes: tree.xpath(f\
5. Detection During Code Review
Detecting XPath injection during code review requires identifying two things: (1) where XPath queries are constructed and (2) whether user input flows into those queries without parameterization or escaping.
XPath Sinks by Language
| Language | XPath Sink | Notes |
|---|---|---|
| Java | XPath.evaluate(), XPath.compile() | Use XPathVariableResolver for safe queries |
| C#/.NET | SelectSingleNode(), SelectNodes(), XPathNavigator.Select() | No built-in parameterization |
| Python | lxml etree.xpath(), ET.find(), ET.findall() | lxml supports $var parameters |
| PHP | DOMXPath::query(), SimpleXMLElement::xpath() | No built-in parameterization |
| Node.js | xpath.select(), xpath.evaluate() | No built-in parameterization |
| Ruby | Nokogiri::XML::Node#xpath(), #at_xpath() | Supports $var parameters |
| Go | xmlpath.MustCompile(), xmlquery.Find() | No built-in parameterization |
Grep patterns to find potential XPath injection
1# Search for XPath evaluation with potential string concat
2rg "xpath\.evaluate|xpath\.compile|XPathFactory" --type java
3rg "SelectSingleNode|SelectNodes|XPathNavigator" --type-add 'cs:*.cs' --type cs
4rg "\.xpath\(|etree\.XPath|\.findall\(" --type py
5rg "DOMXPath|->query\(|->xpath\(" --type php
6rg "xpath\.select|xpath\.evaluate" --type js --type ts
7
8# Look for string concatenation near XPath calls
9rg "xpath.*\+.*req\.|xpath.*\+.*param|xpath.*\+.*input" --type java
10rg "xpath.*\$.*\{|xpath.*format|xpath.*%s" --type py
11rg "xpath.*\$_GET|xpath.*\$_POST|xpath.*\$_REQUEST" --type php
12
13# Search for f-string or format-string XPath queries in Python
14rg "xpath\(f['"]|xpath\(.*\.format\(" --type py
15
16# Find XML parsing that might be combined with XPath
17rg "DocumentBuilder|SAXParser|DOMParser" --type java
18rg "etree\.parse|etree\.fromstring|minidom\.parse" --type py
19rg "DOMDocument|simplexml_load" --type php- Trace all XPath expression construction — Any XPath expression built with string concatenation, interpolation, or formatting that includes user input is a potential injection point.
- Check for parameterized query usage — Verify that the language/library's parameterized XPath mechanism is used (Java:
XPathVariableResolver, Python/lxml: keyword args, Ruby/Nokogiri: variable hash). - Review XML authentication patterns — XML-based login systems are the highest-risk pattern. Search for XPath queries that check both username and password.
- Inspect SOAP service handlers — SOAP services that parse XML requests and use values in XPath queries are a common attack surface.
- Look for SAML XPath processing — SAML assertions are XML documents; any XPath over assertion attributes with user-controlled values is dangerous.
- Verify escaping functions — If manual escaping is used instead of parameterization, check that it handles both single and double quotes, and uses the
concat()pattern for values containing both.
During code review, you find a Java method that builds an XPath query using string concatenation but the developer points out they've added input validation: `if (input.contains("'")) throw new Exception("Invalid input");`. Is this sufficient?
6. Prevention Strategies
The strongest defense against XPath injection is using parameterized XPath queries where the library treats user input as data, not as part of the XPath expression syntax. When parameterized queries are not available, proper escaping with the concat() pattern is the next best option.
Prevention Strategy 1: Parameterized XPath (Java)
1// ✅ MOST SECURE: XPathVariableResolver
2import javax.xml.xpath.*;
3import javax.xml.namespace.QName;
4
5public class SafeXPathQuery {
6 private final XPath xpath;
7 private final Document document;
8
9 public SafeXPathQuery(Document document) {
10 this.document = document;
11 XPathFactory factory = XPathFactory.newInstance();
12 this.xpath = factory.newXPath();
13 }
14
15 public NodeList findUser(String username, String password) throws XPathExpressionException {
16 // Variables are bound separately from the expression
17 final Map<String, String> variables = Map.of(
18 "user", username,
19 "pass", password
20 );
21
22 xpath.setXPathVariableResolver(
23 name -> variables.get(name.getLocalPart())
24 );
25
26 // $user and $pass are treated as string literals
27 XPathExpression expr = xpath.compile(
28 "//users/user[name=$user and password=$pass]"
29 );
30
31 return (NodeList) expr.evaluate(document, XPathConstants.NODESET);
32 }
33}Prevention Strategy 2: Parameterized XPath (Python/lxml)
1# ✅ MOST SECURE: lxml parameterized XPath
2from lxml import etree
3
4def find_user(tree, username, password):
5 # Keyword arguments are automatically escaped by lxml
6 return tree.xpath(
7 "//user[name=$name and password=$pass]",
8 name=username,
9 pass=password,
10 )
11
12def search_products(tree, search_term, category):
13 return tree.xpath(
14 "//product[contains(name, $term) and @category=$cat]",
15 term=search_term,
16 cat=category,
17 )
18
19# ✅ ALTERNATIVE: Precompiled XPath with variables
20from lxml.etree import XPath
21
22find_user_xpath = XPath(
23 "//user[name=$name and password=$pass]"
24)
25
26def find_user_compiled(tree, username, password):
27 return find_user_xpath(
28 tree,
29 name=username,
30 pass=password,
31 )Prevention Strategy 3: Escape function + DOM methods (Node.js)
1// ✅ SECURE: Robust XPath value escaping
2function escapeXPathString(value) {
3 if (typeof value !== 'string') {
4 throw new TypeError('XPath value must be a string');
5 }
6
7 const hasSingle = value.includes("'");
8 const hasDouble = value.includes('"');
9
10 if (!hasSingle) return "'" + value + "'";
11 if (!hasDouble) return '"' + value + '"';
12
13 // Value contains both quote types — use concat()
14 const parts = value.split("'");
15 return "concat(" +
16 parts.map((part, i) =>
17 (i > 0 ? ", "'", " : "") + "'" + part + "'"
18 ).join("") +
19 ")";
20}
21
22// Usage
23const safeName = escapeXPathString(userInput);
24const query = `//user[name=${safeName}]`;
25
26// ✅ BEST: Avoid XPath entirely — use DOM traversal
27function findUserByName(doc, targetName) {
28 const users = doc.getElementsByTagName('user');
29 for (const user of users) {
30 const name = user.getElementsByTagName('name')[0];
31 if (name?.textContent === targetName) {
32 return user;
33 }
34 }
35 return null;
36}Prevention Strategy Comparison
| Strategy | Security Level | Availability | Best For |
|---|---|---|---|
| Parameterized XPath ($var) | Highest — structural separation | Java, Python/lxml, Ruby/Nokogiri | All XPath queries where supported |
| Escape with concat() pattern | High — handles all quote types | All languages (manual) | Languages without parameterization |
| DOM traversal instead of XPath | Highest — eliminates XPath entirely | All languages | Simple lookups where XPath is overkill |
| Input validation (allowlist) | Medium — defense-in-depth | All languages | Additional layer, never sole defense |
| Blocklist (strip quotes) | Low — easily bypassed | All languages | Not recommended as primary defense |
Migrate Away from XML Data Stores
The most effective long-term strategy is to stop using XML as a data store. If your application uses XML files for user authentication, configuration lookups, or data persistence, migrating to a proper database with parameterized queries (SQL, NoSQL) or structured configuration formats (environment variables, JSON with schema validation) eliminates the XPath injection attack surface entirely. For new applications, never use XML as a user database.
Your team uses Python with lxml. Which approach is most secure for querying XML with user-supplied values?