Suggestion: use standard dict with Python >= 3.7 #144

mpyatishev · 2019-05-22T06:33:16Z

Since the order of elements in the dictionary is guaranteed in python 3.7, it would be good to use a standard dictionary instead of OrderedDict - this will save memory and slightly improve performance. Though, probably, at the size of 128 elements it does not matter much.

$ ipython
Python 3.7.3 (default, Apr  3 2019, 05:39:12) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.5.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from collections import OrderedDict

In [2]: import sys

In [3]: std_dict = {i: object() for i in range(128)}

In [4]: ord_dict = OrderedDict((i, object()) for i in range(128))

In [5]: sys.getsizeof(std_dict)
Out[5]: 4704

In [6]: sys.getsizeof(ord_dict)
Out[6]: 10912

In [7]: %timeit for i in range(128): i in std_dict
3.79 µs ± 6.01 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [8]: %timeit for i in range(128): i in ord_dict
4.17 µs ± 44.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In addition, it seems that there is an error in the _cache_hit method: probably, in this case, the key should be moved to the beginning, not the end of the dictionary. Although I do not really understand what in general can seriously affect the movement of the key on small amounts of data.

The text was updated successfully, but these errors were encountered:

asvetlov · 2019-05-22T08:36:33Z

Switching between two dict implementations depending on Python version is the additional maintainance burden.

Do you have performance numbers not for dict vs OrderedDict but for async_lru itself?

mpyatishev · 2019-05-22T09:00:24Z

Switching between two dict implementations depending on Python version is the additional maintainance burden.

I fully agree with this. So it's just a suggestion.

Do you have performance numbers not for dict vs OrderedDict but for async_lru itself?

No, I don't have those measurements.

asvetlov · 2019-05-22T09:19:02Z

No, I don't have those measurements.

Please do it. The boost evidence is crucial for making the decision.

mpyatishev · 2019-05-22T11:44:47Z

Please do it. The boost evidence is crucial for making the decision.

Here you are:

$ ipython
Python 3.7.3 (default, Apr  3 2019, 05:39:12)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.5.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import asyncio

In [2]: import sys

In [3]: from dataclasses import dataclass

In [4]: from async_lru import alru_cache

In [5]: @dataclass
   ...: class Obj:
   ...:     i: int
   ...: 

In [6]: @alru_cache()
   ...: async def ord_dict(i):
   ...:     return Obj(i)
   ...: 

In [7]: @alru_cache(dict_cls=dict)
   ...: async def std_dict(i):
   ...:     return Obj(i)
   ...: 

In [8]: async def test_ord_dict():
   ...:     for i in range(128):
   ...:         await ord_dict(i)
   ...: 

In [9]: async def test_std_dict():
   ...:     for i in range(128):
   ...:         await std_dict(i)
   ...: 

In [10]: loop = asyncio.get_event_loop()

In [11]: %timeit loop.run_until_complete(test_ord_dict())
201 µs ± 1.26 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [12]: %timeit loop.run_until_complete(test_std_dict())
202 µs ± 2.52 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [13]: ord_dict.cache_info()
Out[13]: CacheInfo(hits=10382080, misses=128, maxsize=128, currsize=128)

In [14]: std_dict.cache_info()
Out[14]: CacheInfo(hits=1038080, misses=128, maxsize=128, currsize=128)

In [15]: sys.getsizeof(ord_dict._cache)
Out[15]: 10912

In [16]: sys.getsizeof(std_dict._cache)
Out[16]: 4704

As you can see, the performance of using the standard dictionary has not changed much, but the memory consumption is significantly reduced.
Note: I had to comment out __cache_touch calls in _cache_hit and _cache_miss

ja8zyjits · 2021-04-19T15:25:44Z

@mpyatishev the ord_dict loops and hits are 10x higher than std_dict. So was that a copy paste error? Or am I missing something?

asvetlov · 2023-02-18T22:20:04Z

The library uses OrderedDict.popitem(last=False) for FIFO strategy.
dict.popitem() supports LIFO only.
Thus, the plain replacement is not possible.
functools.lru_cache uses a simplified version of OrderedDict (it stores in plain dict (PREV, NEXT, KEY, RESULT) tuple).
Definitely, this approach is viable but requires non-zero effort for initial implementation and future support.

I personally don't want to work on it but will be happy to review PR that replaces OrderedDict with structures similar to the approach used by lru_cache.

WH-2099 · 2023-07-09T11:26:51Z

I believe this substitution is unnecessary. In the current project, __cache needs to not only preserve the stored order but also adjust the existing order based on this. This is precisely the correct usage of OrderedDict after version 3.7 (using move_to_end() and popitem()).
https://docs.python.org/3/library/collections.html#ordereddict-objects

The OrderedDict algorithm can handle frequent reordering operations better than dict. As shown in the recipes below, this makes it suitable for implementing various kinds of LRU caches.

webknjaz added enhancement help wanted labels Jul 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion: use standard dict with Python >= 3.7 #144

Suggestion: use standard dict with Python >= 3.7 #144

mpyatishev commented May 22, 2019

asvetlov commented May 22, 2019

mpyatishev commented May 22, 2019

asvetlov commented May 22, 2019

mpyatishev commented May 22, 2019

ja8zyjits commented Apr 19, 2021

asvetlov commented Feb 18, 2023

WH-2099 commented Jul 9, 2023

Suggestion: use standard dict with Python >= 3.7 #144

Suggestion: use standard dict with Python >= 3.7 #144

Comments

mpyatishev commented May 22, 2019

asvetlov commented May 22, 2019

mpyatishev commented May 22, 2019

asvetlov commented May 22, 2019

mpyatishev commented May 22, 2019

ja8zyjits commented Apr 19, 2021

asvetlov commented Feb 18, 2023

WH-2099 commented Jul 9, 2023