Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decoupled Tracab dat / json from meta data file types #364

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

UnravelSports
Copy link
Contributor

@UnravelSports UnravelSports commented Nov 20, 2024

I noticed in the Tracab data-loader we were enforcing that whenever a .dat file was provided we could only provide a .xml meta data file, and when we provide a .json tracking file this would always need to be paired with a .json meta data file.

I found a situation where this was not the case (ie. we had .dat and .json meta data), and thus the parser broke.

Concretely this means:

  • Inside tracab.load (more specifically inside identify_deserializer) we no longer use the combination of meta_data_extension and raw_data_extension to see which Deserializer we need. This is now only dependents on the raw_data_extension
  • As a result we need to communicate the meta_data_extension to the deserializer. I chose to do this by adding a new parameter to both TRACABDatDeserializer and TRACABJSONDeserializer called meta_data_extension. (Note: I tried to do this differently, but because we already open the meta data inside load and pass it directly to the deserializer as the TRACABInputs() we can't retrieve the meta_data_extension after that step.
  • Inside both deserializers we now call a function load_meta_data which takes self.meta_data_extension and inputs.meta_data and returns this ugly thing. (I can clean this up by return a dictionary with these values or something, but not sure if that's necessary.
(
    pitch_size_height,
    pitch_size_width,
    teams,
    periods,
    frame_rate,
    date,
    game_id,
) = load_meta_data(self.meta_data_extension, inputs.meta_data)
  • Side note: Doing this load_meta_data step made me realize that Enriched metadata with date, game_week and game_id #340 had an oversight and did not include UTC date and game_id in the TracabDATDeserializer. This means I had to edit line 176 in kloppy/tests/test_tracab.py to make the test reflect UTC time. (This is the test that currently fails though)
  • load_meta_data lives in a newly created helpers.py file inside kloppy/infa/serializers/tracking/tracab/ and it acts a a simple switching board to grab either load_meta_data_xml or load_meta_data_json based the meta_data_extension. (This could be done differently, but not sure if that's necessary).
  • This new helpers.py file also contains both functions for parsing the correct meta data file.
  • I have added a simple test to check that everything still works when we provide the new combinations of tracking and meta data files.
  • I moved some of the tests into a general function we can call from multiple tests called meta_tracking_assertions. I did this because we're running the same asserts 4 times on 4 different combinations of tracking and meta data.

@UnravelSports
Copy link
Contributor Author

It seems my local version and the GitHub tests don't agree on what UTC time is...

@UnravelSports
Copy link
Contributor Author

Does anyone have any idea what could be causing this timezone/conversion mismatch? When I test it locally this asserting is true:

assert date == datetime(
                2023, 12, 15, 19, 32, 20, tzinfo=timezone.utc
            )

But here the automated tests seems to be in disagreement by 1 hour and want it to be (2023, 12, 15, 20, 32, 20). The failed test can be found here.

assert datetime.datetime(2023, 12, 15, 20, 32, 20, tzinfo=datetime.timezone.utc) == datetime.datetime(2023, 12, 15, 19, 32, 20, tzinfo=datetime.timezone.utc)
E            +  where datetime.datetime(2023, 12, 15, 19, 32, 20, tzinfo=datetime.timezone.utc) = datetime(2023, 12, 15, 19, 32, 20, tzinfo=datetime.timezone.utc)
E            +    where datetime.timezone.utc = timezone.utc

@probberechts
Copy link
Contributor

I think it is this line:

parse(meta_data.match.attrib["dtDate"]).astimezone(timezone.utc)

You first parse the date using your local timezone, then the .astimezone() operation switches the datetime to a different timezone. I suppose the timezone on your machine is UTC+1, while GHA is on UTC.

I think the correct implementation is

parse(meta_data.match.attrib["dtDate"]).replace(tzinfo=timezone.utc)

@UnravelSports
Copy link
Contributor Author

Thanks, totally didn't notice that!

@UnravelSports
Copy link
Contributor Author

FWIW, this should now be fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants