dtm: (Default)
dtm ([personal profile] dtm) wrote2007-04-15 11:42 am
Entry tags:

Most HTML templating languages are written incorrectly

Continuing my pattern of occasional technical posts just that my journal won't be completely dormant, here's another one:

If you do much web development at all, you probably work with a template language of some kind. You know, the kind of thing where you write HTML with various placeholders in spots that get filled in by the web application - examples include jsp pages, Django's template system, Smarty templates, PHP pages, or HTML::Mason.

Anyway, the problem with virtually every HTML templating language out there is that they make it easier for the person writing HTML templates to add an XSS hole than to avoid it. This isn't a matter of making it possible for page writers to shoot themselves in the foot - that's always going to be possible, given any reasonable system - it's a matter of making it easier to do than to avoid.

I'm going to pick as an example jsp, because it's just horrendous in this regard, especially in JSP 2.0 where they make doing the wrong thing extra easy. Let's say you have a page where you echo back to the user something they typed in. Say, you have a search page and you preface search results with something like:

Search results for “pirate monkeys”:

When the user types "pirate monkeys" into the search box. Now, in jsp you might well be tempted to code that as:

<b>Search results for &ldquo;${param.q}&rdquo;</b>:

And in your testing, this would appear to work just fine (assuming that the search box wound up in a parameter named q). However, if you did that you'd be opening yourself up to an XSS attack - in short, you'd be making it possible for someone to construct a url that they can hand to your users to cause nasty things to happen. Instead, the correct way to write that is:

<b>Search results for &ldquo;<c:out value="${param.q}">&rdquo;</b>:

Now, which is easier to type? Which one is therefore more likely to be typed in a hurry when you've got to get the site up by the end of the week?

Jsps have another way of including dynamic content by putting snippets of java code insed <%= %> tags, but it suffers from the same problem. The default behavior is to include the text verbatim, not doing any HTML escaping, and that's just wrong.

Other templating languages make it easier to properly esacape output, but still tend to make it easier to include stuff verbatim (a rare possibility) than it is to include stuff properly escaped. For example, a snippet on the django page linked to above gives:

{% for story in story_list %}
<h2>
<a href="{{ story.get_absolute_url }}">
{{ story.headline|upper }}
</a>
</h2>
<p>{{ story.tease|truncatewords:"100" }}</p>
{% endfor %}


Which is all fine and good if you never let your users submit story content. The XSS-proof version of that snippet is:

{% for story in story_list %}
<h2>
<a href="{{ story.get_absolute_url|escape }}">
{{ story.headline|upper|escape }}
</a>
</h2>
<p>{{ story.tease|truncatewords:"100"|escape }}</p>
{% endfor %}


They make it easier to escape variable values than jsp does, but don't make it the default. The other templating languages cited above suffer from a similar design flaw. At least HTML::Mason allows the site administrator to set a site-wide "default escape", so that at least on that site the right thing is easier than adding XSS bugs.

If you ever find yourself in the position of designing an html template language, please make the default behavior when including variables be to HTML-escape them. Note that it is much better to err on the side of over-escaping content than under-escaping, because things that are over-escaped will be caught in development/testing because they produce visible artifacts. Things that are under-escaped will not be caught until someone starts stealing your users' valuable information through an XSS hole you left on some obscure page you never really thought about.

Edit 2011-11-13: Closing comments, because they just attract spam. There are other articles that have said the same better since - for example, search for "google security auto-escape" on your favorite search engine.

Flexy is the answer

(Anonymous) 2007-04-15 09:04 pm (UTC)(link)
For PHP at least... By default placeholders contents are htmlentitied, if you want a variable to output html you have to specify with the html filter (eg {somevalue:h} will output html with angle brackets intact, {somevalue} will produce escaped content... Does the right thing by default and more people ought to use it...

-metapundit
http://www.metapundit.net/sections/blog

RXML

[identity profile] ecmanaut.blogspot.com (from livejournal.com) 2007-04-16 04:35 am (UTC)(link)
The only template language I've come across that really does the right thing is RXML, the Roxen Macro Language (http://docs.roxen.com/roxen/4.5/web_developer_manual/entity/encoding.xml) (since version 2).

By default, it quotes all output to the quoting rules of the content-type the document is served in, or the attribute of some macro tag. Thus it properly handles the different attribute syntaxes understood in XML and HTML, it knows what to quote how for text/javascript, SQL queries used in data fetching for this and that database, and so on, and is generally really pleasant to work with.

The format to include an appropriately (content-type / context sensitively) quoted variable is &form.q; (equivalent of the example given above), or &roxen.version;, and you can override with your own pick of quoting using &form.q:js; for javascript quoing, or even a series of quotings applied after one another; &form.q:mysql:html;. Opting out of quoting is available with the quoting scheme "none", so where needed, &form.q:none; does the trick.

Why not just use XSL?

(Anonymous) 2007-04-16 07:25 am (UTC)(link)
Why not simply change the custom, quirky templating language to a standards-based, well-formed one and use XSL? Our company uses 3 different development architectures, but they all use XSL as the templating layer, and its speeded up development and roll-out of all our applications.

[identity profile] smin.livejournal.com 2007-04-16 09:26 am (UTC)(link)
PHP suffers from the same issue with the <?= syntax. I considered writing an extension to add a <?~ operator which would output htmlspecialchars()'d strings but I never got starting with it. In the same vain as the statement above on JSP but in PHP, why are the two most important functions from an XSS and SQL injection perspective, htmlspecialchars() and mysql_real_escape_string(), the longest function names in the language?

[identity profile] kragen.livejournal.com 2007-04-21 10:30 pm (UTC)(link)
Nevow's Stan has what I think is a better answer. Rather than putting the information about whether a particular field is supposed to be quoted or not in the template, where it is difficult to verify that it matches the logic producing that field, it puts it in the value to be interpolated. If you put ordinary strings into your template, they are automatically quoted as HTML; if you have a variable that contains raw HTML that you don't want to quote, you have to put it into a special kind of object that has a different "flatten method."

In this way, the XSS-free-ness of the data flow is verifiable incrementally: user inputs start as strings, and if they remain strings, you're safe, because they'll be quoted. If at any stage you do something funky that will avoid a string being quoted, that decision is clearly located at one point in your program, hopefully next to the code that makes sure that string is safe to not be quoted.

This is the dynamically-typed equivalent of Joel Spolsky's suggested Hungarian solution to the problem.