Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strict static typing when maxOccurs > 1 or <xs:sequence> isn't occuring exactly once #1121

Open
sashkent3 opened this issue Mar 1, 2025 · 0 comments

Comments

@sashkent3
Copy link

sashkent3 commented Mar 1, 2025

This issue is about improving static types which xsdata generates so that they reflect the original schema more precisely.

maxOccurs > 1

Consider the following schema:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:complexType name="Root">
    <xs:sequence>
      <xs:element name="Child" type="xs:string" minOccurs="0" maxOccurs="2"></xs:element>
    </xs:sequence>
  </xs:complexType>
</xs:schema>

Here's a piece of code xsdata generates from it:

class Root:
    first: tuple[str, ...]

The generated type for Root.first is overly permissive. From a static typing perspective, it allows tuples of arbitrary length, which is incorrect. Arguably, the type of Root.first here should either be tuple[()] | tuple[str] | tuple[str, str] or None | tuple[str] | tuple[str, str]. Python doesn't currently support fixed-length homogeneous tuples (see python/typing#786), but with code generation xsdata could simply construct the long union type for any finite maxOccurs.
An issue arises when both minOccurs > 0 and maxOccurs = 'unbounded'. To my knowledge, there's no mechanism to deal with such cases properly in Python, so using tuple[T, ...] seems like a reasonable fallback.
When the maxOccurs = 'unbounded', the type tuple[T, T, T, *tuple[T, ...]] should be used where the initial T is repeated minOccurs times or simply tuple[T, ...] if the minOccurs = 0.

<xs:sequence> isn't occuring exactly once

Unless <xs:sequence> has both minOccurs = 1 and maxOccurs = 1, every child of a sequence is meant to repeat the same number of times. Here's an example schema:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:complexType name="Root">
    <xs:sequence minOccurs="0" maxOccurs="2">
      <xs:element name="FirstChild" type="xs:string"></xs:element>
      <xs:element name="SecondChild" type="xs:string"></xs:element>
    </xs:sequence>
  </xs:complexType>
</xs:schema>

And here're the types xsdata generates:

class Root:
    first_child: tuple[str, ...]
    second_child: tuple[str, ...]

The problem here is twofold:

  1. These types are once again needlessly wide as they don't reflect the maxOccurs="2" constraint.
  2. These types don't reflect that first_child and second_child can only repeat the same amount of times.

I propose, that xsdata should instead treat the single occurrence of <xs:sequence> as a separate type and all repetitions should apply to that type according to the rules outlined in the first section of this issue. For example, the above schema would roughly translate to:

class RootSequence:
  first_child: str
  second_child: str

class Root:
  sequences: tuple[()] | tuple[RootSequence] | tuple[RootSequence, RootSequence]
@sashkent3 sashkent3 changed the title Strict-ish static typing when maxOccurs > 1 or <xs:sequence> isn't occuring exactly once Strict static typing when maxOccurs > 1 or <xs:sequence> isn't occuring exactly once Mar 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant