Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: use standard dict with Python >= 3.7 #144

Open
mpyatishev opened this issue May 22, 2019 · 7 comments
Open

Suggestion: use standard dict with Python >= 3.7 #144

mpyatishev opened this issue May 22, 2019 · 7 comments

Comments

@mpyatishev
Copy link

Since the order of elements in the dictionary is guaranteed in python 3.7, it would be good to use a standard dictionary instead of OrderedDict - this will save memory and slightly improve performance. Though, probably, at the size of 128 elements it does not matter much.

$ ipython
Python 3.7.3 (default, Apr  3 2019, 05:39:12) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.5.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from collections import OrderedDict

In [2]: import sys

In [3]: std_dict = {i: object() for i in range(128)}

In [4]: ord_dict = OrderedDict((i, object()) for i in range(128))

In [5]: sys.getsizeof(std_dict)
Out[5]: 4704

In [6]: sys.getsizeof(ord_dict)
Out[6]: 10912

In [7]: %timeit for i in range(128): i in std_dict
3.79 µs ± 6.01 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [8]: %timeit for i in range(128): i in ord_dict
4.17 µs ± 44.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In addition, it seems that there is an error in the _cache_hit method: probably, in this case, the key should be moved to the beginning, not the end of the dictionary. Although I do not really understand what in general can seriously affect the movement of the key on small amounts of data.

@asvetlov
Copy link
Member

Switching between two dict implementations depending on Python version is the additional maintainance burden.

Do you have performance numbers not for dict vs OrderedDict but for async_lru itself?

@mpyatishev
Copy link
Author

Switching between two dict implementations depending on Python version is the additional maintainance burden.

I fully agree with this. So it's just a suggestion.

Do you have performance numbers not for dict vs OrderedDict but for async_lru itself?

No, I don't have those measurements.

@asvetlov
Copy link
Member

No, I don't have those measurements.

Please do it. The boost evidence is crucial for making the decision.

@mpyatishev
Copy link
Author

Please do it. The boost evidence is crucial for making the decision.

Here you are:

$ ipython
Python 3.7.3 (default, Apr  3 2019, 05:39:12)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.5.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import asyncio

In [2]: import sys

In [3]: from dataclasses import dataclass

In [4]: from async_lru import alru_cache

In [5]: @dataclass
   ...: class Obj:
   ...:     i: int
   ...: 

In [6]: @alru_cache()
   ...: async def ord_dict(i):
   ...:     return Obj(i)
   ...: 

In [7]: @alru_cache(dict_cls=dict)
   ...: async def std_dict(i):
   ...:     return Obj(i)
   ...: 

In [8]: async def test_ord_dict():
   ...:     for i in range(128):
   ...:         await ord_dict(i)
   ...: 

In [9]: async def test_std_dict():
   ...:     for i in range(128):
   ...:         await std_dict(i)
   ...: 

In [10]: loop = asyncio.get_event_loop()

In [11]: %timeit loop.run_until_complete(test_ord_dict())
201 µs ± 1.26 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [12]: %timeit loop.run_until_complete(test_std_dict())
202 µs ± 2.52 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [13]: ord_dict.cache_info()
Out[13]: CacheInfo(hits=10382080, misses=128, maxsize=128, currsize=128)

In [14]: std_dict.cache_info()
Out[14]: CacheInfo(hits=1038080, misses=128, maxsize=128, currsize=128)

In [15]: sys.getsizeof(ord_dict._cache)
Out[15]: 10912

In [16]: sys.getsizeof(std_dict._cache)
Out[16]: 4704

As you can see, the performance of using the standard dictionary has not changed much, but the memory consumption is significantly reduced.
Note: I had to comment out __cache_touch calls in _cache_hit and _cache_miss

@ja8zyjits
Copy link

@mpyatishev the ord_dict loops and hits are 10x higher than std_dict. So was that a copy paste error? Or am I missing something?

@asvetlov
Copy link
Member

The library uses OrderedDict.popitem(last=False) for FIFO strategy.
dict.popitem() supports LIFO only.
Thus, the plain replacement is not possible.
functools.lru_cache uses a simplified version of OrderedDict (it stores in plain dict (PREV, NEXT, KEY, RESULT) tuple).
Definitely, this approach is viable but requires non-zero effort for initial implementation and future support.

I personally don't want to work on it but will be happy to review PR that replaces OrderedDict with structures similar to the approach used by lru_cache.

@WH-2099
Copy link

WH-2099 commented Jul 9, 2023

I believe this substitution is unnecessary. In the current project, __cache needs to not only preserve the stored order but also adjust the existing order based on this. This is precisely the correct usage of OrderedDict after version 3.7 (using move_to_end() and popitem()).
https://docs.python.org/3/library/collections.html#ordereddict-objects

The OrderedDict algorithm can handle frequent reordering operations better than dict. As shown in the recipes below, this makes it suitable for implementing various kinds of LRU caches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants