Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deduce synthetic carry events for event providers which don't annotate them #396

Open
wants to merge 21 commits into
base: master
Choose a base branch
from

Conversation

lodevt
Copy link

@lodevt lodevt commented Jan 7, 2025

Addressing Issue #379

I tried to follow a similar approach as in socceraction to add carry events for providers which don't annotate them: (https://github.com/ML-KULeuven/socceraction/blob/af297a490fc521d27622b947e93fa15c5e0092da/socceraction/spadl/base.py#L38).

The implementation and tests are not complete yet. I think that a great way to test this is to remove the carry events from a StatsBomb dataset and then compare the generated carries with the carries from the original dataset. It is impossible to have an accuracy of 100% but still the output should have high similarities.

I am happy to hear any feedback while further implementing this.

@lodevt
Copy link
Author

lodevt commented Jan 8, 2025

To test this, I removed the carry events from a StatsBomb dataset and considered these events as the ground truth to compare the generated carries with.

True positives are generated carries where a matching StatsBomb carry from the same player within 5 seconds time is found.
False positives are generated carries where no matching StatsBomb carry is found.
False negatives are StatsBomb carries with a length greater than the min_carry_length_meters (3 meters) that have no match in the generated carries.

The current implementation reaches as accuracy of around 89% (466 TP, 38 FP, 18 FN).

kloppy/domain/models/event.py Outdated Show resolved Hide resolved
kloppy/domain/models/event.py Outdated Show resolved Hide resolved
kloppy/domain/models/event.py Outdated Show resolved Hide resolved
kloppy/domain/services/event_deducers/carry.py Outdated Show resolved Hide resolved
kloppy/domain/services/event_deducers/carry.py Outdated Show resolved Hide resolved
kloppy/domain/services/event_deducers/carry.py Outdated Show resolved Hide resolved
kloppy/domain/services/event_deducers/carry.py Outdated Show resolved Hide resolved
kloppy/domain/services/event_deducers/carry.py Outdated Show resolved Hide resolved
kloppy/domain/services/event_deducers/carry.py Outdated Show resolved Hide resolved
kloppy/domain/services/event_deducers/carry.py Outdated Show resolved Hide resolved
@lodevt
Copy link
Author

lodevt commented Jan 16, 2025

What do you think would be the best convention to follow to generate the event_ids of the synthetic carries?

The StatsBomb deserialiser follows the convention '<event_type>-<previous_event_id>' (see duels and ball out) and the Wyscout V3 deserializer similarly uses 'synthetic-<team_id>-<previous_event_id>' for synthetic formation change events (see formation change).

Based on this, I suggest using 'carry-<previous_event_id>' as the event_id for the synthetic carries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants