Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wyscout v3 events sometimes have 10 player formations for (opponent) team #315

Open
DriesDeprest opened this issue May 21, 2024 · 9 comments

Comments

@DriesDeprest
Copy link
Contributor

I noticed that the team / opponent team object in the events of Wyscout v3 event data, doesn't always have a formation where the total amount of players summed up equals to 11. For example, I've seen occurrences of events with a "4-4-1" (opponent) team formation. Currently, our serializer crashes when this is the case, as it does not recognize this formation.

When analysing the event data of a match where we have the 'troublesome' formations, I saw that this was the result of a team getting a red card booking and in the events that followed the formation of that team was described by a 10 player formation.
The team originally had a "4-4-1-1" formation, but after the red card this shifted to "4-4-1".

How do we want to handle this?

Option A:

  • Description: We create new generic kloppy formation types for all possible 10 player formations present in Wyscout and use those to describe the formation of the (opponent) team in the data.
  • Pro: This keeps the kloppy output close to the raw data input.
  • Con: The behaviour of how we describe 10 player formations is different for Wyscout vs other providers. Where for other providers, we still use a 11 player formation after a red card. Resulting in a non-standardized approach for different providers.

Option B:

  • Description: When we recognize a 10 player formation in the event data, we keep using the last valid 11 player formation observed in the data of that team for all future events of that team.
  • Pro: This keeps the behaviour standard over different providers, where after a red card we still use 11 player formations to describe team formations.
  • Con: We lose a level of detail of describing the formations of a team.

I think my preference would go to option B, to have standard behaviour across different providers.

@dvilches
Copy link

In my personal opinion, it is always better for the data to reflect reality as accurately as possible.

In our case, it is important to know our own and our opponent's formation, as we analyze "behavior" with different schemes and clearly when there is one or more players less on the field, that changes.

Likewise, it would have to be seen what most users use Kloppy for, since these "own" issues can be solved, as until now, by performing our own processing on eventing data, in this case.

@DriesDeprest
Copy link
Contributor Author

DriesDeprest commented May 24, 2024

@koenvo @JanVanHaaren @probberechts thoughts? I'd like to start implementing this

@DriesDeprest
Copy link
Contributor Author

@dvilches thanks for sharing your take on option A vs B. I understand your need of having an accurate description of a team's behaviour to perform qualitative performance analysis.

Since I'm using kloppy for reading in data from different providers, the aspect that we have a standardized output for different input vendors is more importantly for my use case than the level of detail that we get extra. Therefore, my preference for option B.

In the future, however, I think we should elaborate the possible Enum values of FormationType to also include formations for when there are 10/9/8 players on the pitch and use these for all providers if there are < 11 players on the pitch of a given team.

For Wyscout, we can get the X player formation directly from the team or opponentTeam properties.
For other providers, where the formation data is not included in each event, we would need to do it in an alternative way. We would need to recognize when a team starts playing with < 11 players (due to a red card or sub off without a sub on) and based on the position (defender / midfielder / attacker) of the player that gets sent off, adapt the formation accordingly.
For example, if team A was playing in a 4-5-1 and their CM gets sent off, we would assume they now play in a 4-4-1 until they change formation again.

@dvilches
Copy link

Hi @DriesDeprest, I agree with your perspective. That's why we're clarifying that we can resolve this issue "outside" of Kloppy, and that a quick solution for most users is more important than the "best solution" for us.
Thank you for your continued contributions to the project.

@JanVanHaaren
Copy link
Collaborator

I don't have a strong opinion but I'm leaning towards option B.

In an ideal world, kloppy would be able to represent the actual formations for both teams at each point in a match, but the information that the data providers are offering might be too limited in some cases.

@probberechts
Copy link
Contributor

probberechts commented Dec 18, 2024

I would probably do these formations differently. Either you standardize them and you create a very limited list of standard formations (including 4-4-2, 4-3-3, etc.) and anything that does not fit into it is a FormationType.NON_STANDARD; or we stick to the data provider's formation string. Currently, we are in limbo and basically have added a FormationType for every possible formation string that has been used by any of the providers. For example, I am not sure what a "3-1-2-1-1-2" looks like. Isn't that about the same as a "3-4-3" with a diamond on the midfield?

This needs a lot of refinement, but I would do something like this.

class TeamFormation:
    def __init__(self, formation):
        self.formation = formation
        self.lines, self.formation_type = self._parse_formation(formation)

    def _parse_formation(self, formation):
        try:
            lines = [int(number) for number in formation.split("-")]
            formation_type = FormationType.NON_STANDARD # if you want an attempt to standardize it can still be made here
            return lines, formation_type
        except ValueError:
            raise ValueError("Invalid formation string. Use a format like '4-4-2'.")

# Create a 4-4-2 formation
formation = TeamFormation("4-4-2")

@DriesDeprest
Copy link
Contributor Author

@probberechts I like your approach and agree that this would be a better way to handle formations in general.

Do you think it would make sense to first finish merge the PR which uses option B to handle <11 players formations in Wyscout V3, so that at least the code does not break?

And that after that we create a separate issue for introducing a TeamFormation and reworking our FormationType?

@probberechts
Copy link
Contributor

Do you think it would make sense to first finish merge the PR which uses option B to handle <11 players formations in Wyscout V3, so that at least the code does not break?

For a quick fix, I strongly prefer Option C.

Option C:

  • Description: When a 10-player formation is detected in the event data, we set the formation to something like FormationType.NON_STANDARD, FormationType.NOT_SET, or a similar placeholder.
  • Pro: Consider a scenario where a team transitions from a 4-4-2 formation to a 5-2-1 formation after receiving a red card. It wouldn’t make sense to record that the team continued playing in the 4-4-2 formation. In such cases, it’s far better to leave the formation unset or indicate that it’s non-standard rather than inaccurately assigning a formation.

@DriesDeprest
Copy link
Contributor Author

Okay, I've implemented option C in #330. Let me know what you think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants