Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Item model utils #269

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -121,3 +121,6 @@ ENV/

# PyCharm metadata
.idea/

# Vim
*.swp
224 changes: 224 additions & 0 deletions dcicutils/item_model_utils.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might suggest either model_utils or item_utils. The name doesn't have to be complex, just enough to distinguish it from other things we're likely to have. Are we likely to have any other kinds of models that we want utils for? Or any other kinds of items? I'm OK leaving it if you want to, I'll leave that to you since you'll use it. But it just seems clumsy and like unnecessarily many letters.

Original file line number Diff line number Diff line change
@@ -0,0 +1,224 @@
from __future__ import annotations

from dataclasses import dataclass, field
from functools import lru_cache
from typing import Any, Dict, Iterable, List, Mapping, Optional, Tuple, Union

import structlog

from . import ff_utils


logger = structlog.getLogger(__name__)

LinkTo = Union[str, Dict[str, Any]]


def get_link_to(
existing_item: PortalItem,
link_to: LinkTo,
item_to_create: PortalItem,
) -> Union[str, PortalItem]:
Comment on lines +17 to +21
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a matter of personal preference, I'd like to see this indented like this:

Suggested change
def get_link_to(
existing_item: PortalItem,
link_to: LinkTo,
item_to_create: PortalItem,
) -> Union[str, PortalItem]:
def get_link_to(existing_item: PortalItem,
link_to: LinkTo,
item_to_create: PortalItem,
) -> Union[str, PortalItem]:

The reason I like this style better is:

  1. It leaves "def" and the function name easier to pick out of the crowd.
  2. It means that the only thing in column 0 after a def is the next definition.

"""Create new item model from existing one for given linkTo.

LinkTos be identifiers (e.g. UUIDs) or (partially) embedded objects.

Follow rules of existing item model for fetching linkTo via
request. If not fetching via request, then make item model from
existing properties if possible.
"""
Comment on lines +22 to +29
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are referring to parameters by their type rather than their name. I get why that is. The name is somewhat encouraged to be a python name and fights the notation used in the JSON. But in the worst case this style can be confusing if there's more than one thing with the same type, and falling back on different notation in that case can be irregular. I tend to think of documentation as being program-like and it's useful to refer to things that are parameters by the names of the parameter so the reader of the doc doesn't have to guess. e.g.,

Suggested change
"""Create new item model from existing one for given linkTo.
LinkTos be identifiers (e.g. UUIDs) or (partially) embedded objects.
Follow rules of existing item model for fetching linkTo via
request. If not fetching via request, then make item model from
existing properties if possible.
"""
"""Create new item model from existing_item for given link_to.
A link_to should be an identifier (e.g. a UUID) or a (partially) embedded object.
Follow rules of existing item model for fetching linkTo via
request. If not fetching via request, then make item model from
existing properties if possible.
"""

Note also that you referred in your text to LinkTos in plural, which is awkward to start because you are pluralizing a type name and there is no such plural name, but also the argument doesn't seem to allow multiple link_to ids, just one.

I was unable to interpret what the sentence starting "Follow rules of..." even means, so I left it alone. I also don't quite get how an existing item (which is presumably an object?) can also be an item to create (which presumably can't be an object (because it doesn't exist yet). Are these item types?

I am surprised PortalItem works as a forward reference. Did you try that in Python 3.7? I thought that did not work if you're using it in a type description but maybe it does if it's a class. I don't think it works for a free-standing assignment such as you did LinkTo if it follows. Usually you have to use 'PortalItem' instead for forward references.

fetch_links = existing_item.should_fetch_links()
identifier = get_item_identifier(link_to)
if fetch_links and identifier:
return item_to_create.from_identifier_and_existing_item(
identifier, existing_item
)
if isinstance(link_to, Mapping):
return item_to_create.from_properties_and_existing_item(link_to, existing_item)
return link_to


def get_item_identifier(item: LinkTo) -> str:
if isinstance(item, str):
return item
if isinstance(item, Mapping):
return item.get(PortalItem.UUID, "")


@dataclass(frozen=True)
class PortalItem:
ACCESSION = "accession"
AT_ID = "@id"
TYPE = "@type"
UUID = "uuid"

properties: Dict[str, Any]
auth: Optional[Dict[str, Any]] = field(default=None, hash=False)
fetch_links: Optional[bool] = field(default=False, hash=False)

def __repr__(self) -> str:
return f"{self.__class__.__name__}({self._uuid})"

@property
def _uuid(self) -> str:
return self.properties.get(self.UUID, "")

@property
def _at_id(self) -> str:
return self.properties.get(self.AT_ID, "")

@property
def _accession(self) -> str:
return self.properties.get(self.ACCESSION, "")

@property
def _types(self) -> List[str]:
return self.properties.get(self.TYPE, [])

def should_fetch_links(self) -> bool:
return self.fetch_links

def get_auth(self) -> Union[Dict[str, Any], None]:
return self.auth

def get_properties(self) -> Dict[str, Any]:
return self.properties

def get_accession(self) -> str:
return self._accession

def get_uuid(self) -> str:
return self._uuid

def get_at_id(self) -> str:
return self._at_id

def get_types(self) -> List[str]:
return self._types

def _get_link_tos(
self, link_tos: Iterable[LinkTo], item_to_create: PortalItem
) -> List[Union[str, PortalItem]]:
Comment on lines +99 to +101
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't mark all of the occasions of this issue in this PR, like I did in its companion, but there's no reason for this to extend over three lines. The function header would fit on one line. If you did have to break it, it shouldn't go to column zero (as I explained in more detail elsewhere).

return [self._get_link_to(link_to, item_to_create) for link_to in link_tos]

def _get_link_to(
self,
link_to: LinkTo,
item_to_create: PortalItem,
) -> Union[str, PortalItem]:
return get_link_to(self, link_to, item_to_create)

@classmethod
def from_properties(
cls,
properties: Dict[str, Any],
fetch_links: bool = False,
auth: Dict[str, Any] = None,
**kwargs: Any,
) -> PortalItem:
return cls(properties, fetch_links=fetch_links, auth=auth)

@classmethod
def from_identifier_and_auth(
cls, identifier: str, auth: Dict[str, Any], fetch_links=False, **kwargs: Any
) -> PortalItem:
properties = cls._get_item_via_auth(identifier, auth)
return cls.from_properties(properties, auth=auth, fetch_links=fetch_links)

@classmethod
def _get_item_via_auth(
cls,
identifier: str,
auth: Dict[str, Any],
add_on: Optional[str] = "frame=object",
) -> Dict[str, Any]:
hashable_auth = cls._make_hashable_auth(auth)
return cls._get_and_cache_item_via_auth(identifier, hashable_auth, add_on)

@classmethod
def _make_hashable_auth(cls, auth: Mapping[str, str]) -> Tuple[Tuple[str, str]]:
"""Assuming nothing nested here."""
return tuple(auth.items())

@classmethod
def _undo_make_hashable_auth(
cls, hashable_auth: Tuple[Tuple[str, str]]
) -> Dict[str, Any]:
return dict(hashable_auth)

@classmethod
@lru_cache(maxsize=256)
def _get_and_cache_item_via_auth(
cls,
identifier: str,
hashable_auth: Tuple[Tuple[str, Any]],
add_on: Optional[str] = None,
) -> Dict[str, Any]:
"""Save on requests by caching items."""
auth = cls._undo_make_hashable_auth(hashable_auth)
try:
result = ff_utils.get_metadata(identifier, key=auth, add_on=add_on)
except Exception as e:
result = {}
logger.error(f"Error getting metadata for {identifier}: {e}")
return result

@classmethod
def from_identifier_and_existing_item(
cls, identifier: str, existing_item: PortalItem, **kwargs: Any
) -> PortalItem:
fetch_links = existing_item.should_fetch_links()
auth = existing_item.get_auth()
if auth:
return cls.from_identifier_and_auth(
identifier, auth, fetch_links=fetch_links
)
raise RuntimeError("Unable to fetch given identifier without auth key")

@classmethod
def from_properties_and_existing_item(
cls, properties: Dict[str, Any], existing_item: PortalItem, **kwargs: Any
) -> PortalItem:
fetch_links = existing_item.should_fetch_links()
auth = existing_item.get_auth()
return cls.from_properties(properties, fetch_links=fetch_links, auth=auth)


@dataclass(frozen=True)
class NestedProperty:
properties: Dict[str, Any]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like a blank line before this, since there are blank lines between the definitions. It just looks better, even if PEP8 might not require it.

parent_item: Optional[PortalItem] = field(default=None, hash=False)

def __repr__(self) -> str:
return f"{self.__class__.__name__}(parent={self.parent_item.__repr__()})"

def get_properties(self) -> Dict[str, Any]:
return self.properties

def get_parent_item(self) -> Union[PortalItem, None]:
return self.parent_item

def _get_link_tos(
self, link_tos: LinkTo, item_to_create: PortalItem
) -> List[Union[str, PortalItem]]:
return [self._get_link_to(link_to, item_to_create) for link_to in link_tos]

def _get_link_to(
self,
link_to: LinkTo,
item_to_create: PortalItem,
) -> Union[str, PortalItem]:
if self.parent_item:
return get_link_to(self.parent_item, link_to, item_to_create)
if isinstance(link_to, Mapping):
return item_to_create.from_properties(link_to)
return link_to

@classmethod
def from_properties(
cls,
properties: Dict[str, Any],
parent_item: Optional[PortalItem] = None,
**kwargs: Any,
) -> NestedProperty:
return cls(properties, parent_item=parent_item)
37 changes: 37 additions & 0 deletions dcicutils/testing_utils.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some reason this would not go in qa_utils? I'm not sure another modules is needed.

Also, the reason qa_utils is not itself called testing test_utils or testing_utils is to avoid the use of the substring "test" in its name, so that its contents aren't seen as tests to run. Various tools differ on their precise treatment of filenames with test in them, but my feeling is that it's best to dodge use of ...test... where possible. So that's why I would put this in qa_utils.

Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
from contextlib import contextmanager
from types import ModuleType
from typing import Any, Iterator, Optional
from unittest import mock


AN_UNLIKELY_RETURN_VALUE = "unlikely return value"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ordinary convention for Python is to just do

AN_UNLIKELY_RETURN_VALUE = object()

because as soon as you name a string, it becomes interned and possible for someone else to find by name in a pointer-equal way. It's annoying at times. But this is why I added dcicutils.NamedObject, so that you can do something a little classier that is actually reliably unique (you could name it something less probabilistic, too):

_UNSUPPLIED = NamedObject('unsupplied')

where the token NamedObject has unique identity and prints in a more stringy way.

>>> x = NamedObject('unsupplied')
>>> str(x)
'<unsupplied>'
>>> repr(x)
'<unsupplied@101036400>'
>>> x
<unsupplied@101036400>

(Note that I recommend using _ in the name of this because in my long-time considered opinion it's a bad idea to export such a name so that different modules or classes use the same variable for missing/undefined/etc or else one might pass its missing argument explicitly to something else who might think no argument had been passed. The semantics gets quickly messy because it's not clear that an explicitly passed missing argument is intended to mean missing (though that's subject to intent in each such situation). Better each use of unsupplied should be a separate object in my view unless you've thought it through carefully and really mean to be creating a whole ecosystem that conspires on the notion of what a filled or missing argument is, and usually that's a distraction from work you're trying to do. So making the var non-exported/shared seems better to me.)



@contextmanager
def patch_context(
to_patch: object,
return_value: Any = AN_UNLIKELY_RETURN_VALUE,
module: Optional[ModuleType] = None,
**kwargs,
) -> Iterator[mock.MagicMock]:
"""Mock out the given object.

Essentially mock.patch_object with some hacks to enable linting
on the object to patch instead of providing as a string.

Depending on import structure, adding the module to patch may be
required.
"""
if isinstance(to_patch, property):
to_patch = to_patch.fget
new_callable = mock.PropertyMock
else:
new_callable = mock.MagicMock
if module is None:
target = f"{to_patch.__module__}.{to_patch.__qualname__}"
else:
target = f"{module.__name__}.{to_patch.__qualname__}"
with mock.patch(target, new_callable=new_callable, **kwargs) as mocked_item:
if return_value != AN_UNLIKELY_RETURN_VALUE:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use NamedObject, you can rely on object identity and this will read nicer:

if return_value is not UNSUPPLIED:
    mocked_item.return_value = return_value

mocked_item.return_value = return_value
yield mocked_item
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "dcicutils"
version = "7.6.0"
version = "7.7.0.1b1"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally the convention we are using is to make the beta be epsilon more than the version it is built from, not the build it is targeting. For example, if a 7.7.0 is issued while this is being developed, and you have not merged that version, this version will sort higher and so things that seek "^7.7.0" will think you have support for everything below this number and I don't think you do. Unless maybe you've merged 7.7.0 into here, in which case ignore me. :) I often add a comment saying something about what the highest merged version is. The entire tech for all this is heuristic, but I hope that makes sense.

description = "Utility package for interacting with the 4DN Data Portal and other 4DN resources"
authors = ["4DN-DCIC Team <[email protected]>"]
license = "MIT"
Expand Down
Loading