dtm: (Default)
dtm ([personal profile] dtm) wrote2007-04-15 11:42 am
Entry tags:

Most HTML templating languages are written incorrectly

Continuing my pattern of occasional technical posts just that my journal won't be completely dormant, here's another one:

If you do much web development at all, you probably work with a template language of some kind. You know, the kind of thing where you write HTML with various placeholders in spots that get filled in by the web application - examples include jsp pages, Django's template system, Smarty templates, PHP pages, or HTML::Mason.

Anyway, the problem with virtually every HTML templating language out there is that they make it easier for the person writing HTML templates to add an XSS hole than to avoid it. This isn't a matter of making it possible for page writers to shoot themselves in the foot - that's always going to be possible, given any reasonable system - it's a matter of making it easier to do than to avoid.

I'm going to pick as an example jsp, because it's just horrendous in this regard, especially in JSP 2.0 where they make doing the wrong thing extra easy. Let's say you have a page where you echo back to the user something they typed in. Say, you have a search page and you preface search results with something like:

Search results for “pirate monkeys”:

When the user types "pirate monkeys" into the search box. Now, in jsp you might well be tempted to code that as:

<b>Search results for &ldquo;${param.q}&rdquo;</b>:

And in your testing, this would appear to work just fine (assuming that the search box wound up in a parameter named q). However, if you did that you'd be opening yourself up to an XSS attack - in short, you'd be making it possible for someone to construct a url that they can hand to your users to cause nasty things to happen. Instead, the correct way to write that is:

<b>Search results for &ldquo;<c:out value="${param.q}">&rdquo;</b>:

Now, which is easier to type? Which one is therefore more likely to be typed in a hurry when you've got to get the site up by the end of the week?

Jsps have another way of including dynamic content by putting snippets of java code insed <%= %> tags, but it suffers from the same problem. The default behavior is to include the text verbatim, not doing any HTML escaping, and that's just wrong.

Other templating languages make it easier to properly esacape output, but still tend to make it easier to include stuff verbatim (a rare possibility) than it is to include stuff properly escaped. For example, a snippet on the django page linked to above gives:

{% for story in story_list %}
<h2>
<a href="{{ story.get_absolute_url }}">
{{ story.headline|upper }}
</a>
</h2>
<p>{{ story.tease|truncatewords:"100" }}</p>
{% endfor %}


Which is all fine and good if you never let your users submit story content. The XSS-proof version of that snippet is:

{% for story in story_list %}
<h2>
<a href="{{ story.get_absolute_url|escape }}">
{{ story.headline|upper|escape }}
</a>
</h2>
<p>{{ story.tease|truncatewords:"100"|escape }}</p>
{% endfor %}


They make it easier to escape variable values than jsp does, but don't make it the default. The other templating languages cited above suffer from a similar design flaw. At least HTML::Mason allows the site administrator to set a site-wide "default escape", so that at least on that site the right thing is easier than adding XSS bugs.

If you ever find yourself in the position of designing an html template language, please make the default behavior when including variables be to HTML-escape them. Note that it is much better to err on the side of over-escaping content than under-escaping, because things that are over-escaped will be caught in development/testing because they produce visible artifacts. Things that are under-escaped will not be caught until someone starts stealing your users' valuable information through an XSS hole you left on some obscure page you never really thought about.

Edit 2011-11-13: Closing comments, because they just attract spam. There are other articles that have said the same better since - for example, search for "google security auto-escape" on your favorite search engine.

[identity profile] kragen.livejournal.com 2007-04-23 11:00 pm (UTC)(link)
The code is in http://pobox.com/~kragen/sw/laptoptable.py now, and reading over it I notice that I'm not sure whether that's what I did or not.

Because it's like Nevow, it doesn't really have a separate textual template as such --- it just has a tree of elements, each of which includes its children rendered as HTML. The <script> tag renders its children differently --- it just asks them to convert themselves to strings rather than HTML --- because it has a CDATA content model in the HTML DTD.

You could treat "Search results for #{term}:" as syntactic sugar for ["Search results for ", as_html(term), ":"] though. Hmm...