Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Wyscout V3] Adding substitution events #367

Open
DriesDeprest opened this issue Nov 21, 2024 · 6 comments
Open

[Wyscout V3] Adding substitution events #367

DriesDeprest opened this issue Nov 21, 2024 · 6 comments

Comments

@DriesDeprest
Copy link
Contributor

In Wyscout V3 data substitutions are not listed in the event stream, but defined separately in the raw event file. What is the best way to handle them in the deserializer?

I can create them after that we iterated through all the raw events, but how do I should I best go about adding them into the records? Do we already have a method that allows us to order a list of events based on period & timestamp? Or should I create this?

@probberechts
Copy link
Contributor

Inserting substitutions into the event stream is more complex than simple sorting because Wyscout only provides the minute in which a substitution occurred, not the precise timestamp.

You would need to:

  1. Identify game interruptions within the substitution minute's window.
  2. Use the interruption duration as a tiebreaker if multiple interruptions occur in that window.
  3. Insert the substitution event just before the corresponding game restart.

@DriesDeprest
Copy link
Contributor Author

Thanks for the input, Pieter!

What would happen if we mark that the substitutions happen exactly at the minute provider by Wyscout. I know we can then be off for X, no more than 60, seconds. Would this result into issues other than that the substitution time information is just slightly off?

The reason I'm asking, is because I don't really care about seconds as a level of detail for substitutions and would suggest that I first create a PR which introduces substitutions using the naive approach. And if someone later needs seconds as a level of detail, he could enhance the logic with identifying game interruptions to improve the level of detail of setting substitutions.

I just don't want the desire for a perfect solution to stand in the way of already implementing the main goal, getting the substitution information (albeit with a lower level of detail) in the dataset.

What do you think?

@probberechts
Copy link
Contributor

You will get substitution events when the ball is in play and logical event sequences will get interrupted (e.g., you could get a substitution between a player's carry and pass). It just doesn't make sense at all.

Also, it will break code that derives things from subsequent events. For example, I have some logic that determines whether the ball is in play. This will break.

It's really not that hard to implement it correctly.

@DriesDeprest
Copy link
Contributor Author

Okay, if it results into interrupted event sequences and can break downstream logic, it should indeed be done correctly directly. I'll share a PR soon. Thanks for thinking this through together!

@DriesDeprest
Copy link
Contributor Author

@probberechts when reviewing the substitutions in the events v3 file of Wyscout, it looks like it is not expressed in minute granularity but rather the exact seconds.

I assume we thus do not need to identify game interruptions and just insert this into the records between the events happening before and after the substitution?

"substitutions": {
        "3159": {
            "2H": {
                "1278": {
                    "in": [
                        {
                            "playerId": 20395
                        }
                    ],
                    "out": [
                        {
                            "playerId": 489124
                        }
                    ]
                },
                "1951": {
                    "in": [
                        {
                            "playerId": 361807
                        }
                    ],
                    "out": [
                        {
                            "playerId": 20751
                        }
                    ]
                },
                "2192": {
                    "in": [
                        {
                            "playerId": 105334
                        },
                        {
                            "playerId": 345695
                        }
                    ],
                    "out": [
                        {
                            "playerId": 472363
                        },
                        {
                            "playerId": 20461
                        }
                    ]
                }
            }
        },
        "3164": {
            "2H": {
                "4": {
                    "in": [
                        {
                            "playerId": 703
                        },
                        {
                            "playerId": 20479
                        },
                        {
                            "playerId": 20689
                        }
                    ],
                    "out": [
                        {
                            "playerId": 415809
                        },
                        {
                            "playerId": 449978
                        },
                        {
                            "playerId": 21006
                        }
                    ]
                },
                "1461": {
                    "in": [
                        {
                            "playerId": 449472
                        },
                        {
                            "playerId": 20446
                        }
                    ],
                    "out": [
                        {
                            "playerId": 239298
                        },
                        {
                            "playerId": 237057
                        }
                    ]
                }
            }
        }
    }

@probberechts
Copy link
Contributor

Oh yes, substitutions seem to be encoded twice in the event stream. I was looking at this encoding:

"substitutions": [
        {
            "minute": 29,
            "playerIn": 680331,
            "playerOut": 38426,
            "assists": "0"
        }, ...
]

Then it probably works to insert this based on timing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants