[[]] * 3 with Python 3: 3 answers

Once someone asked me how would I solve one of Yandex test questions:

What will this program print?
x = [[]] * 3
x[0].append('a')
x[1].append('b')
x[2].append('c')
x[0] = ['d']

print(x)
[['a'], ['b'], ['c']]
[['d'], ['b'], ['c']]
[['d'], ['a', 'b', 'c'], ['a', 'b', 'c']]
[['a', 'b', 'c'], ['a', 'b', 'c'], ['a', 'b', 'c']]
Comment your answer.

Here’s how.

Short and boring answer

Obviously, it will print [['d'], ['a', 'b', 'c'], ['a', 'b', 'c']]. There are two places where x[0] == ['d']:

[['d'], ['b'], ['c']]
[['d'], ['a', 'b', 'c'], ['a', 'b', 'c']]

As x[1] is x[2], [['d'], ['a', 'b', 'c'], ['a', 'b', 'c']] is the answer.

Shorter and less boring answer

It will print [['d'], ['a', 'b', 'c'], ['a', 'b', 'c']]. As first three lines modify items of list (via calling append method of item), and * copies & concatenates list, x[0] is x[1] is x[2] until x[0] = ['d'] statement. x[0] = ['d'] modifies list, not item, so it does not matter what x[0] was previously.

Longer, and (possibly) even less boring answer

For sequences, according to documentation, a * n (or n * a) is repetition of a n times. Documentation also states that * operator makes shallow copies of sequence (that is, * always copies sequence, but not its contents). It seems logical: repetition is basically concatenation of copies of sequence. Simple illustration:

>>> a = object()
>>> [a] * 3
[<object object at 0xb70a88d0>, <object object at 0xb70a88d0>, <object object at 0xb70a88d0>]

List which is returned by [a] * 3 is a new list, but with old contents. As you can see, addresses of all three objects are the same. These objects are the same:

>>> b = [a] * 3
>>> b
[<object object at 0xb70a88d0>, <object object at 0xb70a88d0>, <object object at 0xb70a88d0>]
>>> b[0]
<object object at 0xb70a88d0>
>>> b[0] is b[1] is b[2]
True

That’s why repetition behaves like this (sometimes such consistent behaviour makes some newbies upset):

>>> c = [[1,2,3]]
>>> d = c * 3
>>> d
[[1, 2, 3], [1, 2, 3], [1, 2, 3]]
>>> d[0] is d[1] is d[2]
True

>>> del d[0][0]
>>> d
[[2, 3], [2, 3], [2, 3]]

It’s important to know what code is working with (for example, whose method it is calling): is it “container” or “inner” object.

One of fairly common chunks of code among projects I’ve seen is grouper recipe from itertools’ docs:

def grouper(n, iterable, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)

It builds args list with copies of list_iterator instance (result of iter(somelist)) via repetition.

Now it is clear (I hope) how it works at “Python” level. Let’s see what happens under the hood with CPython 3.3.0. Fastest way to figure out what CPython will do is to disassemble code object:

>>> c = compile('[[]] * 3', '<string>', 'eval')
>>> import dis; dis.dis(c)
  1           0 BUILD_LIST               0 
              3 BUILD_LIST               1 
              6 LOAD_CONST               0 (3) 
              9 BINARY_MULTIPLY      
             10 RETURN_VALUE

Aha! It uses BINARY_MULTIPLY opcode. Time to look at small part of famous switch (see why it is famous):

TARGET(BINARY_MULTIPLY)
    w = POP();
    v = TOP();
    x = PyNumber_Multiply(v, w);

So, it calls PyNumber_Multiply that first tries to call __*mul__, and then, if it got NotImplemented, it uses tp_as_sequence (method definitions) of object’s type (o->ob_type). Here’s interesting thing: this function is very similar to PySequence_Repeat function from Python/C API, but former first calls __*mul__, and then uses sq_repeat field of tp_as_sequence, and the latter does exactly opposite: first it tries to use sq_repeat, and then falls back to __*mul__ (well, as abstract base class collections.abc.Sequence exists, it is not a surprise that there is a group of functions in Python/C API related to sequence protocol). As these two functions are quite similar, I’ll focus on one of them—the one from Python/C API.

PySequence_Repeat function, is “the equivalent of the Python expression o * count”. Documentation says that it has following signature: PyObject* PySequence_Repeat(PyObject *o, Py_ssize_t count). It takes a pointer to PyObject (o), and Py_ssize_t (count) as arguments. o is a pointer to a sequence (PyListObject, that is, instance of list, for [] * count). Function returns pointer to PyObject, or NULL. Plus, on “Python” level it can raise TypeError or SystemError.

Here’s “meat” of PySequence_Repeat function:

PySequence_Repeat(*o, count) {
    /* Case 1 */
    m = o->ob_type->tp_as_sequence;
    if (m && m->sq_repeat)
        return m->sq_repeat(o, count);

    /* Case 2 */
    if (PySequence_Check(o)) {
        n = PyLong_FromSsize_t(count);
        result = binary_op1(o, n, NB_SLOT(nb_multiply));
        if (result != Py_NotImplemented)
            return result;
    }
}

So, there are two cases where repetition may work.

Case 1: object’s type has `tp_as_sequence` which contains `sq_repeat` field

Object’s type has tp_as_sequence which contains sq_repeat field. Function the latter is pointing at has the same signature as PySequence_Repeat has, so sq_repeat(o, count) works.

For [] * count, o is PyListObject. PyListObject is PyObject, thus it has ob_type field too. In case of list instance, ob_type points to PyList_Type. For list, tp_as_sequence points to list_as_sequence (see sequence methods definition), and it is m in definitions (“pseudo-C”—up, and real) of PySequence_Repeat. Finally, sq_repeat of list_as_sequence contains pointer to list_repeat function, which actually implements list repeating, and has the same argument and return types as PySequence_Repeat has: static PyObject* list_repeat(PyListObject *a, Py_ssize_t n).

In addition to list, tp_as_sequence is not 0 for some other types.

`tuple`, `str`, `bytes`, `bytearray`

PyTuple_Type, PyUnicode_Type, PyBytes_Type, PyByteArray_Type all have their tp_as_sequence field set to pointer to corresponding PySequenceMethods structure: tuple_as_sequence, unicode_as_sequence, bytes_as_sequence, bytearray_as_sequence. Corresponding PySequenceMethods structure contains pointer to corresponding sequence repeating function: tuplerepeat, unicode_repeat, bytes_repeat, bytearray_repeat.

`dict`, `mappingproxy`

PyDict_Type has tp_as_sequence set to &dict_as_sequence, but sq_repeat field of dict_as_sequence is set to 0. Same with PyDictProxy_Type.

`memoryview`, `weakproxy`, `weakcallableproxy`, `range`

sq_repeat of memory_as_sequence (proxy_as_sequence, for references; range_as_sequence, for range) of PyMemoryView_Type (_PyWeakref_ProxyType, _PyWeakref_CallableProxyType; PyRange_Type) is set to 0.

Interesting fact: with old Python (say, 2.7) it was possible to use range like this:

>>> range(3) * 5
[0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2]

It does not mean something is broken in Python 3.*, because now range is what xrange was (and you couldn’t repeat xrange, actually):

>>> xrange(3) * 5
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for *: 'xrange' and 'int'

With Python 2.7 range returns list. Repetition works with lists. With Python 3.* range returns instance of class range. It does not support repetition.

`set`, `frozenset`

Both PySet_Type and PyFrozenSet_Type have their tp_as_sequence field point to set_as_sequence method definition, where sq_repeat is set to 0.

Other built-in types have tp_as_sequence set to zero (for example, boolean—PyBool_Type and None—PyNone_Type). This means that only for list, tuple, str, bytes, and bytearray built-ins case 1 will work.

Case 2: object `o` provides sequence protocol and defines `*mul` method

If object o provides sequence protocol and defines __*mul__ method, result = o.__*mul__(int(count)), otherwise TypeError will be raised. If result is not NotImplemented, result is actual result.

The rest

There is nothing “tricky” about rest of the code, except what output to consider printed. I’d consider “printed” only things written to stdout. In such case even following code (p.py file) executed via python3.3 p.py > out.txt will produce reasonable result:

import gc


gc.set_threshold(1)
gc.set_debug(gc.DEBUG_STATS)


x = [[]] * 3
x[0].append('a')
x[1].append('b')
x[2].append('c')
x[0] = ['d']

print(x)

GC’s debug output will be written to stderr, and will be visible (as > redirects stdout, not stderr), but out.txt will contain answer:

[['d'], ['a', 'b', 'c'], ['a', 'b', 'c']]

Note metadata

Publication date

February 20, 2013

Tags

About

This blog is about things I encounter while doing web and non-web software development.

Recent notes

All notes