-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert ecoscope.base
classes to dataframe accessors
#258
Conversation
The use of relocations/trajectories used to look like: trajectory = ecoscope.base.Trajectory.from_relocations(movebank_relocations) With accessors this is now: trajectory = movebank_relocations.trajectories.from_relocations() The main change I want to advertise/socialise is that this PR makes a choice to have all accessor methods that used to return a Relocations/Trajectory instance now return a For example, by returning a GDF, getting turn angle looks like : #all steps return a gdf
trajectory = movebank_relocations.trajectories.from_relocations()
trajectory = trajectory.loc[trajectory.groupby_col == "Habiba"]
trajectory["heading"] = [0, 90, 120, 60, 300]
turn_angle = trajectory.trajectories.get_turn_angle() vs returning trajectory = movebank_relocations.trajectories.from_relocations()
# no longer a gdf
trajectory = trajectory.gdf.loc[trajectory.gdf.groupby_col == "Habiba"]
trajectory.gdf["heading"] = [0, 90, 120, 60, 300]
turn_angle = trajectory.get_turn_angle() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just getting a few comments out to start. More to follow!
trajectory_gdf: Trajectory, | ||
trajectory_gdf: gpd.GeoDataFrame, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love that this is now possible.
Do we want to be more opinionated about typing here, perhaps via pandera schemas?
Certainly the majority of gpd.GeoDataFrames will not work if passed here.
Not necessarily a question to be solved by this PR, but something to consider as a follow-on perhaps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another way of putting this, if this can only operate on Trajectories, than it should be a method (i.e. accessor) on a Trajectory. (Which is the accessors paradigm, is a GeoDataFrame that adheres to a validated schema.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
traj = ... # but, how do we get this? 🤷
traj.ecoscope.calculate_etd()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plain_gdf = ...
traj = plain_gdf.ecoscope.EcoTrajectory()
turn_angle = traj.ecotrajectory.get_turn_angle()
etd = traj.ecotrajectory.calculate_etd()
🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Typing for alex...)
plain_gdf.EcoscopeBase.init()
inited_gdf.relocations.thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from typing import TypeVar
import pandera as pa
import pandas as pd
S = TypeVar("S")
class RelocationsSchema(pa.Schema): ...
class TrajectorySchema(pa.Schema): ...
class GDFWithSchema(Generic[S]): ...
@pd.api.extensions.register_dataframe_accessor("ecoscope")
class ecoscope():
def __init__(self, pandas_obj):
assert isinstance(obj, gpd.GeoDataFrame)
self._gdf = pandas_obj
def Relocations(..., parser=...) -> GDFWithSchema[RelocationsSchema]:
# possibly coerce to schema here
assert pa.validate(self._gdf, RelocationsSchema)
return self._gdf
def is_trajectory():
return pa.validate(self._gdf, TrajectorySchema)
def Trajectory(..., parser=...) -> GDFWithSchema[TrajectorySchema]:
relocs = self.Relocations(parser=parser)
...
@pd.api.extensions.register_dataframe_accessor("trajectory")
class trajectory():
def __init__(self, pandas_obj):
pa.validate(self._gdf, TrajectorySchema)
self._gdf = pandas_obj
def get_turn_angle(...) ...:
def calculate_etd(...) ...:
@pd.api.extensions.register_dataframe_accessor("is_trajectory")
class is_trajectory():
def __call__(self) -> bool:
return pa.validate(self._gdf, TrajectorySchema)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plain_gdf = ...
reloc = plain_gdf.ecoscope.Relocations()
traj = reloc.ecoscope.Trajectory()
# ~ OR ~
traj = plain_gdf.ecoscope.Trajectory(parser=...)
traj.trajectory.get_turn_angle()
traj.trajectory.calculate_etd()
...
if not plain_gdf.is_trajectory(): ... # can accessors implement __call__()?
# ~ OR ~
if not plain_gdf.ecoscope.is_trajectory(): ...
plain_gdf.trajectory.get_turn_angle() # -> validation error
cache-environment: true | ||
cache-downloads: true | ||
init-shell: bash | ||
|
||
- name: Install pip dependencies and our package | ||
shell: bash -leo pipefail {0} | ||
run: | | ||
python -m pip install ".[all]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice that's a much more intelligible separation between base environment and package under test.
Just to summarise the conversation with @cisaacstern, let's hold off on merging this for now and have a deeper discussion about how the core API is structured. The main things to hash out are
|
I'm going to close this issue in favour of the discussed rework that includes much broader API changes than described here. See https://github.com/wildlife-dynamics/compose/issues/27 for more info. I've split some of the smaller changes in this PR that don't strictly relate to the core change out into their own PRs, rather than deferring them until the broader API change is complete. |
Opening this as a draft to get eyes on:
Relocations
andTrajectory
are now dataframe accessors_straighttrack_properties
is nowStraighttrackMixin
and available to bothRelocations
andTrajectory
cachedproperty
in favour of built-incached_property
analysis.proximity
into Trajectorygeopandas
andnumpy
cachedproperty
with built-incached_property
? #153gpd.read_file()
#155calculate_proximity
be a function onTrajectory
? #157Tasks remaining:
add_speedmap
logic in a more generalised waygeopandas<0.14.2
, unsure why as yet