Once someone asked me how would I solve one of Yandex test questions:
What will this program print?
x = [[]] * 3 x[0].append('a') x[1].append('b') x[2].append('c') x[0] = ['d'] print(x)
[['a'], ['b'], ['c']]
[['d'], ['b'], ['c']]
[['d'], ['a', 'b', 'c'], ['a', 'b', 'c']]
[['a', 'b', 'c'], ['a', 'b', 'c'], ['a', 'b', 'c']]
Comment your answer.
Here’s how.
Obviously, it will print [['d'], ['a', 'b', 'c'], ['a', 'b', 'c']]
. There are two places where x[0] == ['d']
:
[['d'], ['b'], ['c']]
[['d'], ['a', 'b', 'c'], ['a', 'b', 'c']]
As x[1] is x[2]
, [['d'], ['a', 'b', 'c'], ['a', 'b', 'c']]
is the answer.
It will print [['d'], ['a', 'b', 'c'], ['a', 'b', 'c']]
. As first three lines modify items of list (via calling append
method of item), and *
copies & concatenates list, x[0] is x[1] is x[2]
until x[0] = ['d']
statement. x[0] = ['d']
modifies list, not item, so it does not matter what x[0]
was previously.
For sequences, according to documentation, a * n
(or n * a
) is repetition of a
n
times. Documentation also states that *
operator makes shallow copies of sequence (that is, *
always copies sequence, but not its contents). It seems logical: repetition is basically concatenation of copies of sequence. Simple illustration:
>>> a = object()
>>> [a] * 3
[<object object at 0xb70a88d0>, <object object at 0xb70a88d0>, <object object at 0xb70a88d0>]
List which is returned by [a] * 3
is a new list, but with old contents. As you can see, addresses of all three objects are the same. These objects are the same:
>>> b = [a] * 3
>>> b
[<object object at 0xb70a88d0>, <object object at 0xb70a88d0>, <object object at 0xb70a88d0>]
>>> b[0]
<object object at 0xb70a88d0>
>>> b[0] is b[1] is b[2]
True
That’s why repetition behaves like this (sometimes such consistent behaviour makes some newbies upset):
>>> c = [[1,2,3]]
>>> d = c * 3
>>> d
[[1, 2, 3], [1, 2, 3], [1, 2, 3]]
>>> d[0] is d[1] is d[2]
True
>>> del d[0][0]
>>> d
[[2, 3], [2, 3], [2, 3]]
It’s important to know what code is working with (for example, whose method it is calling): is it “container” or “inner” object.
One of fairly common chunks of code among projects I’ve seen is grouper
recipe from itertools’ docs:
def grouper(n, iterable, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
It builds args
list with copies of list_iterator
instance (result of iter(somelist)
) via repetition.
Now it is clear (I hope) how it works at “Python” level. Let’s see what happens under the hood with CPython 3.3.0. Fastest way to figure out what CPython will do is to disassemble code object:
>>> c = compile('[[]] * 3', '<string>', 'eval')
>>> import dis; dis.dis(c)
1 0 BUILD_LIST 0
3 BUILD_LIST 1
6 LOAD_CONST 0 (3)
9 BINARY_MULTIPLY
10 RETURN_VALUE
Aha! It uses BINARY_MULTIPLY
opcode. Time to look at small part of famous switch (see why it is famous):
TARGET(BINARY_MULTIPLY)
w = POP();
v = TOP();
x = PyNumber_Multiply(v, w);
So, it calls PyNumber_Multiply
that first tries to call __*mul__
, and then, if it got NotImplemented
, it uses tp_as_sequence
(method definitions) of object’s type (o->ob_type
). Here’s interesting thing: this function is very similar to PySequence_Repeat
function from Python/C API, but former first calls __*mul__
, and then uses sq_repeat
field of tp_as_sequence
, and the latter does exactly opposite: first it tries to use sq_repeat
, and then falls back to __*mul__
(well, as abstract base class collections.abc.Sequence
exists, it is not a surprise that there is a group of functions in Python/C API related to sequence protocol). As these two functions are quite similar, I’ll focus on one of them—the one from Python/C API.
PySequence_Repeat
function, is “the equivalent of the Python expression o * count
”. Documentation says that it has following signature: PyObject* PySequence_Repeat(PyObject *o, Py_ssize_t count)
. It takes a pointer to PyObject
(o
), and Py_ssize_t
(count
) as arguments. o
is a pointer to a sequence (PyListObject
, that is, instance of list
, for [] * count
). Function returns pointer to PyObject
, or NULL
. Plus, on “Python” level it can raise TypeError
or SystemError
.
Here’s “meat” of PySequence_Repeat
function:
PySequence_Repeat(*o, count) {
/* Case 1 */
m = o->ob_type->tp_as_sequence;
if (m && m->sq_repeat)
return m->sq_repeat(o, count);
/* Case 2 */
if (PySequence_Check(o)) {
n = PyLong_FromSsize_t(count);
result = binary_op1(o, n, NB_SLOT(nb_multiply));
if (result != Py_NotImplemented)
return result;
}
}
So, there are two cases where repetition may work.
tp_as_sequence
which contains sq_repeat
fieldObject’s type has tp_as_sequence
which contains sq_repeat
field. Function the latter is pointing at has the same signature as PySequence_Repeat
has, so sq_repeat(o, count)
works.
For [] * count
, o
is PyListObject
. PyListObject
is PyObject
, thus it has ob_type
field too. In case of list
instance, ob_type
points to PyList_Type
. For list
, tp_as_sequence
points to list_as_sequence
(see sequence methods definition), and it is m
in definitions (“pseudo-C”—up, and real) of PySequence_Repeat
. Finally, sq_repeat
of list_as_sequence
contains pointer to list_repeat
function, which actually implements list repeating, and has the same argument and return types as PySequence_Repeat
has: static PyObject* list_repeat(PyListObject *a, Py_ssize_t n)
.
In addition to list
, tp_as_sequence
is not 0 for some other types.
tuple
, str
, bytes
, bytearray
PyTuple_Type
, PyUnicode_Type
, PyBytes_Type
, PyByteArray_Type
all have their tp_as_sequence
field set to pointer to corresponding PySequenceMethods
structure: tuple_as_sequence
, unicode_as_sequence
, bytes_as_sequence
, bytearray_as_sequence
. Corresponding PySequenceMethods
structure contains pointer to corresponding sequence repeating function: tuplerepeat
, unicode_repeat
, bytes_repeat
, bytearray_repeat
.
dict
, mappingproxy
PyDict_Type
has tp_as_sequence
set to &dict_as_sequence
, but sq_repeat
field of dict_as_sequence
is set to 0. Same with PyDictProxy_Type
.
memoryview
, weakproxy
, weakcallableproxy
, range
sq_repeat
of memory_as_sequence
(proxy_as_sequence
, for references; range_as_sequence
, for range
) of PyMemoryView_Type
(_PyWeakref_ProxyType
, _PyWeakref_CallableProxyType
; PyRange_Type
) is set to 0.
Interesting fact: with old Python (say, 2.7) it was possible to use range
like this:
>>> range(3) * 5
[0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2]
It does not mean something is broken in Python 3.*, because now range
is what xrange
was (and you couldn’t repeat xrange
, actually):
>>> xrange(3) * 5
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for *: 'xrange' and 'int'
With Python 2.7 range
returns list. Repetition works with lists. With Python 3.* range
returns instance of class range
. It does not support repetition.
set
, frozenset
Both PySet_Type
and PyFrozenSet_Type
have their tp_as_sequence
field point to set_as_sequence
method definition, where sq_repeat
is set to 0.
Other built-in types have tp_as_sequence
set to zero (for example, boolean—PyBool_Type
and None
—PyNone_Type
). This means that only for list
, tuple
, str
, bytes
, and bytearray
built-ins case 1 will work.
o
provides sequence protocol and defines __*mul__
methodIf object o
provides sequence protocol and defines __*mul__
method, result = o.__*mul__(int(count))
, otherwise TypeError
will be raised. If result
is not NotImplemented
, result
is actual result.
There is nothing “tricky” about rest of the code, except what output to consider printed. I’d consider “printed” only things written to stdout
. In such case even following code (p.py
file) executed via python3.3 p.py > out.txt
will produce reasonable result:
import gc
gc.set_threshold(1)
gc.set_debug(gc.DEBUG_STATS)
x = [[]] * 3
x[0].append('a')
x[1].append('b')
x[2].append('c')
x[0] = ['d']
print(x)
GC’s debug output will be written to stderr
, and will be visible (as >
redirects stdout
, not stderr
), but out.txt
will contain answer:
[['d'], ['a', 'b', 'c'], ['a', 'b', 'c']]
This blog is about things I encounter while doing web and non-web software development.