From time to time I do consulting on Django projects. This time customer wanted me to review large codebase, but focus on security. Most large projects have “do not touch this or sky will fall and you’ll be parsing HTML with regular expressions in hell for eternity” code. It is a big problem: noone except highly motivated person (like me ;) knows how project works. Some developer (not knowledgeable of project’s details) can introduce vulnerability (or vulnerabilities), and it’s a big luck if anyone will notice.
Let’s take a look at sample Django project that consists of three files. Main module (app.py
):
from django.conf import settings
if not settings.configured:
settings.configure(
DEBUG=True,
ROOT_URLCONF='app',
TEMPLATE_DIRS=('.', ),
)
from django.conf.urls.defaults import patterns
from django.template.base import libraries
from django.shortcuts import render
import tags
libraries['twitter'] = tags.register
urlpatterns = patterns('',
(r'^$', lambda request: render(
request,
'index.html',
{'link': 'twitter.com/test"></a><script>alert(\'pwned\')</script>'}
)),
)
if __name__ == '__main__':
from django.core.management import execute_from_command_line
execute_from_command_line()
One template (index.html
):
{% load twitter %}{% render_twitter_account_link link %}
Module with template tags (tags.py
):
import urlparse
from django import template
register = template.Library()
@register.simple_tag
def render_twitter_account_link(url):
netloc = 'twitter.com'
if has_scheme(url) or url.startswith(netloc):
url = ensure_url_scheme(url)
parsed = urlparse.urlparse(url)
path = parsed.path.strip('/')
extracted = path or parsed.fragment.lstrip('!/')
if (not extracted) or (parsed.netloc != netloc):
return ''
name = u'@%s' % extracted
else: # '@name' or 'name'
name = url
url = u'https://%s/%s' % (netloc, name.lstrip('@'))
return u'<a href="%(url)s" rel="nofollow">%(name)s</a>' % {
'url': url,
'name': name,
}
def ensure_url_scheme(url):
if has_scheme(url):
return url
return u'http://%s' % url
def has_scheme(url):
return url.startswith(('http://', 'https://'))
If you will run python app.py runserver
and go to http://127.0.0.1:8000/
, you’ll get JavaScript alert saying that you’re “pwned
”.
Let’s see what browser got:
<a href="http://twitter.com/test"></a><script>alert('pwned')</script>" rel="nofollow">@test"></a><script>alert('pwned')</script></a>
You noticed that link
context variable contains twitter.com/test"></a><script>alert(\'pwned\')</script>
. I hardcoded it, just to make things clearer. That link
is passed to template tag: {% render_twitter_account_link link %}
, and, as it starts with twitter.com
, name
(look at template tag’s code) will contain @test"></a><script>alert('pwned')</script>
and url
will contain http://twitter.com/test"></a><script>alert('pwned')</script>
.
Developer of this template tag forgot what Django documentation says:
The output from template tags is not automatically run through the auto-escaping filters.
Simplest fix requires us to escape template tag’s input as soon as we can:
import urlparse
from django import template
from django.utils.html import escape
register = template.Library()
@register.simple_tag
def render_twitter_account_link(url):
netloc = 'twitter.com'
url = escape(url)
if has_scheme(url) or url.startswith(netloc):
url = ensure_url_scheme(url)
parsed = urlparse.urlparse(url)
path = parsed.path.strip('/')
extracted = path or parsed.fragment.lstrip('!/')
if (not extracted) or (parsed.netloc != netloc):
return ''
name = u'@%s' % extracted
else: # '@name' or 'name'
name = url
url = u'https://%s/%s' % (netloc, name.lstrip('@'))
return u'<a href="%(url)s" rel="nofollow">%(name)s</a>' % {
'url': url,
'name': name,
}
def ensure_url_scheme(url):
if has_scheme(url):
return url
return u'http://%s' % url
def has_scheme(url):
return url.startswith(('http://', 'https://'))
Note what we did: first we imported escape
function via from django.utils.html import escape
, then we began escaping url
(url = escape(url)
) as soon as possible, i.e. before we use it in code.
Now browser gets proper link:
<a href="http://twitter.com/test"></a><script>alert('pwned')</script>" rel="nofollow">@test"></a></a>
This blog is about things I encounter while doing web and non-web software development.