Reproducible builds and django.utils.feedgenerator

Recently I’ve been working towards making builds of this blog reproducible. My goal was to allow use of regular diff for spotting differences between resulting files.

But there was a problem. Atom 1.0 feeds (e.g., /notes/feed/, /tags/django/feed/) are generated with my fork of django.utils.feedgenerator. Both original and fork use xml.sax.saxutils.XMLGenerator (implementation of ContentHandler interface) subclass called SimplerXMLGenerator for XML generation, and both pass elements’ attributes into startElement and startElementNS as regular dicts. This caused random ordering of XML elements’ attributes in resulting feeds: textually feeds were changing, while semantically they were not. Despite being inconvenient for my use case (when use of specialized tools for XML comparison is undesirable), such behavior is, according to specification, valid:

Note that the order of attribute specifications in a start-tag or empty-element tag is not significant.

So I decided to take advantage of that. Both startElement and startElementNS methods assume that attrs argument is an object that behaves like mapping (see lines 170 and 195 of Lib/xml/sax/saxutils.py). OrderedDict is a mapping (like dict is), therefore it is possible to provide attrs (attributes of XML element) as an instance of OrderedDict to preserve order of attributes. Least intrusive change—it’s a fork, after all—is to override two mentioned methods and sort attrs there (_order_attrs) before passing to implementation of superclass (XMLGenerator):

import operator
import collections
from xml.sax.saxutils import XMLGenerator

_order_attrs_key = operator.itemgetter(0)

def _order_attrs(attrs):
    return collections.OrderedDict(
        sorted(attrs.items(), key=_order_attrs_key))

class SimplerXMLGenerator(XMLGenerator):

    def startElement(self, name, attrs):
        return super().startElement(name, _order_attrs(attrs))

    def startElementNS(self, name, qname, attrs):
        return super().startElementNS(name, qname, _order_attrs(attrs))

    # ...here goes the rest (already present in feedgenerator)

That’s what I did. Now my blog engine does not randomly reorder attributes in feeds, and the latter are still perfectly valid XML :).

