Skip to content

Tools to transform classes

Thor Whalen edited this page Mar 10, 2020 · 2 revisions

Wrapping methods if they exist

Consider cache_iter as it is:

def cache_iter(store=None, iter_to_container=list, name=None):
    if store is None:
        return partial(cache_iter, iter_to_container=iter_to_container, name=name)
    elif not isinstance(store, type):  # then consider it to be an instance
        store_instance = store
        WrapperStore = cache_iter(Store, iter_to_container=iter_to_container, name=name)
        return WrapperStore(store_instance)
    else:
        store_cls = store
        name = name or 'IterCached' + get_class_name(store_cls)

        ###### Difference starts here ###############################
        cached_cls = type(name, (store_cls,), {'_iter_cache': None})

        @lazyprop
        def _iter_cache(self):
            return iter_to_container(super(cached_cls, self).__iter__())  # TODO: Should it be iter(super(...)?

        def __iter__(self):
            yield from self._iter_cache

        def __len__(self):
            return len(self._iter_cache)

        cached_cls.__iter__ = __iter__
        cached_cls.__len__ = __len__
        cached_cls._iter_cache = _iter_cache

        _define_keys_values_and_items_according_to_iter(cached_cls)

        return cached_cls

It seems the following, if made to work, would be more straightforward:

def cache_iter(store=None, iter_to_container=list, name=None):
    if store is None:
        return partial(cache_iter, iter_to_container=iter_to_container, name=name)
    elif not isinstance(store, type):  # then consider it to be an instance
        store_instance = store
        WrapperStore = cache_iter(Store, iter_to_container=iter_to_container, name=name)
        return WrapperStore(store_instance)
    else:
        store_cls = store
        name = name or 'IterCached' + get_class_name(store_cls)

        ###### Difference starts here ###############################
        class IterCached(_DefineKeysValuesAndItemsAccordingToIter, store_cls):
            @lazyprop
            def _iter_cache(self):
                return iter_to_container(super(IterCached, self).__iter__())  # TODO: Should it be iter(super(...)?

            def __iter__(self):
                yield from self._iter_cache

            def __len__(self):
                return len(self._iter_cache)

        # get rid of attributes that didn't already exist in old class
        old_cls_attrs = set(dir(store_cls))
        for attr in dir(IterCached):
            if attr not in old_cls_attrs:
                print(f"Deleting {attr}")
                delattr(IterCached, attr)

        if name is not None:
            IterCached.__qualname__ = name

        return IterCached

Where

def _define_keys_values_and_items_according_to_iter(cls):
    if hasattr(cls, 'keys'):
        def keys(self):
            yield from self.__iter__()  # TODO: Should it be iter(self)?

        cls.keys = keys

    if hasattr(cls, 'values'):
        def values(self):
            yield from (self[k] for k in self)

        cls.values = values

    if hasattr(cls, 'items'):
        def items(self):
            yield from ((k, self[k]) for k in self)

        cls.items = items


class _DefineKeysValuesAndItemsAccordingToIter:
    def keys(self):
        yield from self.__iter__()  # TODO: Should it be iter(self)?

    def values(self):
        yield from (self[k] for k in self)

    def items(self):
        yield from ((k, self[k]) for k in self)

_define_keys_values_and_items_according_to_iter seems less DRY. Lot's of repetitions of a same pattern. Also not extendible to include any set of methods.

Also considered this:

def wrap_cls_methods(*funcs, **named_funcs):
    funcs_with_names = {func.__name__: func for func in funcs}
    named_funcs = dict(funcs_with_names, **named_funcs)
    def cls_methods_wrapper(cls):
        for name, func in named_funcs.items():
            if hasattr(cls, name):
                setattr(cls, name, func)
    return cls_methods_wrapper

Example usage:

A = type('A', (dict,), {})

def __iter__(self):
    yield from ['a', 'c']
    
def items(self):
    yield from ((k, self[k]) for k in self)
    
wrap_cls_methods(__iter__, items)(A)

But really there's only two small advantages over using

A = type('A', (dict,), {'__iter__': __iter__, 'items': items})
  • In the latter, we don't have the choice between named or unnamed, so less DRY
  • In the latter, we can't say "only add this if the base contains it".
  • The decorator syntax might be clearer

Best way to setup the (base class & layers) system

delegation

Right now I settled on delegation mainly (but still do some with subclassing). But my delegation is not “complete”, but partial only. See py2store.base.Store. It wraps a store instance, and redefines the interface methods in such a way that they go through the interface methods (__getitem__, __setitem__, etc.)

Settled on this after seriously trying out different approaches. This is the one that gave me the least resistance. But with age, it feels quite limited.

Possible routes of improvements:

  • Adding a "and forward everything else" functionality to Stores, so they don't filter out all non-declared methods. See ideas in this blog post.
  • Adding better forwarding of modules and names. Once wrapped in a Store everything becomes an abc.Whatever, which is annoying. How would I like it to show up? Well, simple: If a class named MyClass is named in my_module, I'd like it to show up as my_module.MyClass.
  • Come back to a decorator and/or inheritance-based approach. Possibly with meta-classes? Perhaps delegation is not the way. Python doesn't seem to really support delegation well (see stackoverflow answer). If I take this approach, some serious debug-supporting tools need to be integrated, because inheritance approach became a mess, and decorator approaches definitely needed more easy introspection tools.

collections.abc interface

What is a good approach to create base collections.abc interfaces that can be easily wrapped into key/value transformers?

Consider the following code that provides a Mapping interface to listing and reading files.

from collections.abc import Mapping
import os

class FileReader(Mapping):
    """Read data from files under a given rootdir. Keys must be absolute file paths.
    """
    def __init__(self, rootdir):
        self.rootdir = rootdir

    def __getitem__(self, k):
        with open(k, 'rb') as fp:
            data = fp.read()
        return data

    def __iter__(self):
        yield from filter(os.path.isfile, map(lambda x: os.path.join(self.rootdir, x), os.listdir(self.rootdir)))

    def __len__(self):
        return sum(1 for _ in self)

A demo of what you can do with this:

>>> fr = FileReader('/Users/twhalen/tmp/')
>>> list(fr)  # list (full) file paths in folder
['/Users/twhalen/tmp/example.py', '/Users/twhalen/tmp/some.pkl']
>>> fr['/Users/twhalen/tmp/example.py']  # get binary contents of file
b'import collections as my_favorite_module\nprint("hello world")\n'
>>> fr['/Users/twhalen/tmp/some.pkl']  # get binary contents of file
b'\x80\x03]q\x00(K\x01]q\x01(K\x02K\x03ee.'

We can change much of the behavior of the above class simply by specifying how to transform keys and/or values.

For example, if we want keys to be expressed as relative paths instead of full paths, and that values coming from a .pkl file be deserialized using pickle.load, we simply need to wrap our class using the functions:

import pickle
from io import BytesIO

def newkey_to_oldkey(self, newkey):
    return os.path.join(self.rootdir, newkey)

def oldkey_to_newkey(self, oldkey):
    if oldkey.startswith(self.rootdir):
        return oldkey[len(self.rootdir):]
    else:
        raise ValueError(f"{oldkey} should start with {self.rootdir}")

def oldval_to_newval(self, k, v):
    if k.endswith('.pkl'):
        return pickle.load(BytesIO(v))
    return v

One way we could wrap our original class is to do this:

class MyFileReader(FileReader):
    def __getitem__(self, k):
        oldkey = newkey_to_oldkey(self, k)  # transform the incomming key (make a full path)
        oldval = super().__getitem__(oldkey)  # call the parent's getitem with the oldkey
        return oldval_to_newval(self, k, oldval)  # transform oldval before returning it to the user

    def __iter__(self):
        yield from (oldkey_to_newkey(self, oldkey) for oldkey in super().__iter__())

Demo:

>>> fr = MyFileReader('/Users/twhalen/tmp/')
>>> list(fr)
['example.py', 'some.pkl']
>>> fr['example.py']  # contents of file (still raw binary)
b'import collections as my_favorite_module\nprint("hello world")\n'
>>> fr['some.pkl']  # contents of pkl file is given already deserialized!
[1, [2, 3]]

It's fine to do this with a small example, but it won't do if I have many different MutableMapping classes, containing more than a few key/value methods, and I wanted to offer users an easy fail safe way to wrap the classes with their own key/value transformers.

Something more "automated" is in order. I envision the base MutableMapping classes being annotated with Key and Value types that would allow appropriately annotated key/value transformers to "automatically know" where to be applied.

I'm not sure what mix of subclassing, delegation, decorators, and possibly even meta-classes, would achieve such elegance.