Recently, I had a discussion on SO (see it for the context) about the two following pieces of code:
res = [d.get(next((k for k in d if k in s), None), s) for s in lst]
And:
res = [next((v for k,v in d.items() if k in s), s) for s in lst]
Both iterate through strings s
in a list lst
and look for s
in a dict d
. If s
is found, then the associated value is returned, else s
is returned. I'm pretty sure the second piece of code is faster than the first, because (for each s
) there is no lookup in the dictionary, just an iteration on the (key, value) pairs.
The question is:
How to check that this is really what happens under the hood?
I tried, for the first time, the dis
module, but the result was disappointing (python 3.6.3):
>>> dis.dis("[d.get(next((k for k in d if k in s), None), s) for s in lst]")
1 0 LOAD_CONST 0 (<code object <listcomp> at 0x7f8e302039c0, file "<dis>", line 1>)
2 LOAD_CONST 1 ('<listcomp>')
4 MAKE_FUNCTION 0
6 LOAD_NAME 0 (lst)
8 GET_ITER
10 CALL_FUNCTION 1
12 RETURN_VALUE
>>> dis.dis("[next((v for k,v in d.items() if k in s), s) for s in lst]")
1 0 LOAD_CONST 0 (<code object <listcomp> at 0x7f8e302038a0, file "<dis>", line 1>)
2 LOAD_CONST 1 ('<listcomp>')
4 MAKE_FUNCTION 0
6 LOAD_NAME 0 (lst)
8 GET_ITER
10 CALL_FUNCTION 1
12 RETURN_VALUE
How do I get a more detailed information?
EDIT
As suggested by @abarnert in the first comment, I tried to timeit
both solutions. I played around with the following code:
from faker import Faker
from timeit import timeit
fake = Faker()
d = {fake.word():fake.word() for _ in range(50000)}
lst = fake.words(500000)
def f():return [d.get(next((k for k in d if k in s), None), s) for s in lst]
def g():return [next((v for k,v in d.items() if k in s), s) for s in lst]
print(timeit(f, number=1))
print(timeit(g, number=1))
assert f() == g()
Maybe I missed something but, to my surprise, the first piece of code (f
) was always faster than the second (g
). Hence the secondary question: does anyone have an explanation?
EDIT 2 Here are the most interesting parts of the disassembled code (with a little formatting to insert the inner loop).
For f
:
2 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 36 (to 42)
6 STORE_DEREF 0 (s)
8 LOAD_GLOBAL 0 (d)
10 LOAD_ATTR 1 (get)
12 LOAD_GLOBAL 2 (next)
14 LOAD_CLOSURE 0 (s)
16 BUILD_TUPLE 1
18 LOAD_CONST 0 (<code object <genexpr> at 0x7ff191b1d8a0, file "test.py", line 2>)
2 0 LOAD_FAST 0 (.0)
>> 2 FOR_ITER 18 (to 22)
4 STORE_FAST 1 (k)
6 LOAD_FAST 1 (k)
8 LOAD_DEREF 0 (s)
10 COMPARE_OP 6 (in)
12 POP_JUMP_IF_FALSE 2
14 LOAD_FAST 1 (k)
16 YIELD_VALUE
18 POP_TOP
20 JUMP_ABSOLUTE 2
>> 22 LOAD_CONST 0 (None)
24 RETURN_VALUE
20 LOAD_CONST 1 ('f.<locals>.<listcomp>.<genexpr>')
22 MAKE_FUNCTION 8
24 LOAD_GLOBAL 0 (d)
26 GET_ITER
28 CALL_FUNCTION 1
30 LOAD_CONST 2 (None)
32 CALL_FUNCTION 2
34 LOAD_DEREF 0 (s)
36 CALL_FUNCTION 2
38 LIST_APPEND 2
40 JUMP_ABSOLUTE 4
>> 42 RETURN_VALUE
For g
:
3 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 32 (to 38)
6 STORE_DEREF 0 (s)
8 LOAD_GLOBAL 0 (next)
10 LOAD_CLOSURE 0 (s)
12 BUILD_TUPLE 1
14 LOAD_CONST 0 (<code object <genexpr> at 0x7ff1905171e0, file "test.py", line 3>)
3 0 LOAD_FAST 0 (.0)
>> 2 FOR_ITER 22 (to 26)
4 UNPACK_SEQUENCE 2
6 STORE_FAST 1 (k)
8 STORE_FAST 2 (v)
10 LOAD_FAST 1 (k)
12 LOAD_DEREF 0 (s)
14 COMPARE_OP 6 (in)
16 POP_JUMP_IF_FALSE 2
18 LOAD_FAST 2 (v)
20 YIELD_VALUE
22 POP_TOP
24 JUMP_ABSOLUTE 2
>> 26 LOAD_CONST 0 (None)
28 RETURN_VALUE
16 LOAD_CONST 1 ('g.<locals>.<listcomp>.<genexpr>')
18 MAKE_FUNCTION 8
20 LOAD_GLOBAL 1 (d)
22 LOAD_ATTR 2 (items)
24 CALL_FUNCTION 0
26 GET_ITER
28 CALL_FUNCTION 1
30 LOAD_DEREF 0 (s)
32 CALL_FUNCTION 2
34 LIST_APPEND 2
36 JUMP_ABSOLUTE 4
>> 38 RETURN_VALUE
One can see that (again as suggested by @abarnert) the inner loop of g
contains some extra cost:
- (hidden) the construction of the 2-uples by the iterator on
d.items()
- an
UNPACK_SEQUENCE 2
which unpacks those 2-uples and then puts k
and v
on the stack
- two
STORE_FAST
which pop k
and v
from the stack to store them in co_varnames
.
Before it finally loads k
to compare it with s
as in f
. This inner loop is iterated |lst|*|d|
and It seems that those operations make the difference.
If this was optimized as I thought it was, the d.items()
iterator would have put first k
on the stack to test k in s
, and then, only if k in s
was true, put v
on the stack for the YIELD_VALUE
.
See Question&Answers more detail:
os