How to make your webapp amazingly vulnerable to XSS attacks
I started, and at some point may continue, a big long livejournal post about a rather technical topic - ways in which people make themselves vulnerable to XSS attacks - when I ran across this example that is just too horrid not to post about on its own.
A certain major weblogging service, who will remain nameless for a while (though according to some disclosure policies the world should already be told about this) offers users the ability to search the comments that have been made at their blog, and to search them by a number of different criteria. When you do this search, you get the page back as well as a box titled "displaying comments where the Name is “whatever”", and containing those comments.
For reasons known only to the developers, they do not actually write the top title on the comment box out directly when rendering the page. Instead, they write out a placeholder and then set the title of the box inside a javascript function that's invoked from the page's onload event.
Here's the javascript function. (Code slightly modified so as to obscure the guilty party):
function setTitle() { var filter_type, filter_arg; filter_type = 'Name'; filter_arg = 'XXwhateverXX'; var btitle_element = document.getElementById('boxTitle'); btitle_element.innerHTML = 'Showing comments where the ' + filter_type + ' is “' + filter_arg + '”.'; }
Where I put XXwhateverXX, they actually have literally exactly what was searched for. That is, if you search for "G'kar", that line of code comes out to:
filter_arg = 'G'kar';
Note that this search parameter is taken from the url. All an attacker needs to do is construct a url with a search parameter of the form ';evilJavaScript// and their evil java script will get executed when the victim clicks on the url.
Now, almost certainly the first stab at fixing this will be to backslash any single quotes in the url-supplied value. However, that won't be sufficient.
If they do that, the attacker can construct his url to have the search parameter </script><script>evilJavaScript</script> and the attacker's evil javascript will get executed as the page loads, since the attacker's </script> will close their javascript chunk early.
Now here's the kicker: if they then decide to simply backslash every non-alphanumeric character in the user input (i.e. not just the single quotes), the same attack url will still work. It just won't trigger the evil javascript until after the page has loaded.
The reason is that they're replacing btitle_element.innerHTML. Therefore, if filter_val contains anything that's special to html, it'll get interpreted.
In order to fix this they need to either HTML-escape the filter argument before inserting it into that javascript, in addition to backslashing single quotes, or they need to backslash everything non-alphanumeric filter_arg and then, in javascript, perform an html escaping as they build the new value of btitle_element.innerHTML.
They've managed to write the code so that they need to carefully escape it with two totally different ways to make it safe.
Update: I had a technical detail wrong, which must make writing browsers painful in trying to parse tag-soup HTML.