Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed default namespace not working correctly #87

Merged
merged 4 commits into from
Nov 23, 2024

Conversation

marckoehler
Copy link

@marckoehler marckoehler commented Nov 18, 2024

From my understanding, the code does not handle cases correctly where there are default namespaces supposed to be used by an xsd file. Rather, the code assumes the default namespace to be the targetNamespace, which is incorrect. This bugfix fixes the behavior to work as intended by W3C XML Schema.

Two other bugs were fixed along the way:

  • The implicit anyType for an element - if the namespace http://www.w3.org/2001/XMLSchema was the default namespace - would be passed as ':anyType' instead of just 'anyType' which led to wrong behavior later on.
  • If an element used a namespace prefix and that namespace could not be found in the schema, the code just assumed that the element's namespace was the target namespace, which was incorrect. In this case, an error will now be thrown.

Consider this example:

<schema targetNamespace="http://www.example.com" xmlns="http://www.w3.org/2001/XMLSchema">
    <attribute name="myCustomAttribute" type="int"/>
</schema>

The default namespace is defined as http://www.w3.org/2001/XMLSchema. Thus, every child element of schema (the attribute element as well as the type int) belongs to this namespace. The code, however, assumes during its validation process that the type int belongs to the targetNamespace http://www.example.com/ and looks for its definition there, where it is obviously not going to find it since this namespace only defines the attribute myCustomAttribute and nothing else. So the code throws an error.

With the bugfix applied, the code assumes the correct default namespace of http://www.w3.org/2001/XMLSchema to be the one in which to look for the definition of type int for, thus not throwing an error.

There is an edge case here where no default namespace could be defined in the schema above, and it could still be valid XSD. The type int would be in no namespace. The code, however, throws an error without further validation. To validate this edge case correctly, the code would have to look for the definition of type int in the imported schemas that define 'no namespace' components, which are indicated by the 'noNamespaceSchemaLocation' attribute in the http://www.w3.org/2001/XMLSchema-instance namespace. This edge case would require some more refactoring within the namespace handling and is therefore ignored within this bugfix. I will probably look into it later on.

The tests also had to be adapted to match the new logic. In general, I added the xmlns:ex=http://www.example.com/ attribute to the schema element to define the 'ex' prefix for the namespace http://www.example.com/, which was the targetNamespace for all the tests that needed changing. I then also had to reference the components defined in the schemas by prefixing them with the 'ex' prefix since I did not provide a default namespace, and otherwise we would be in the edge case above. Lastly, I added 2 new tests to check if the default namespace is used correctly and that an unknown namespace to the schema fails the validation.

For further information, https://www.oracle.com/technical-resources/articles/srivastava-namespaces.html explains very well how default namespace and targetNamespace work within XSD.

@Guite Guite requested review from goetas and Guite November 21, 2024 07:02
@Guite
Copy link
Collaborator

Guite commented Nov 21, 2024

LGTM
Thank you very much @marckoehler

Waiting for @goetas taking a look as well before merging.

Since these changes have some potential for breaking existing integrations, this should become a new minor version.

@goetas
Copy link
Member

goetas commented Nov 22, 2024

I'm surprised that such issue wasn't found before. It makes sense! thanks @marckoehler for digging that deep in the rabbit hole!

@goetas
Copy link
Member

goetas commented Nov 22, 2024

@marckoehler I'm not aware of the reason for you working with this library, but do you know https://github.com/goetas-webservices/xsd2php ? is there any chance you could run your changes against that project and see if something breaks?

@Guite is there any chance you could run this changes against your project as well?

@veewee is there any chance you could run this changes against your project as well?

I would like to merge this but I'm aware that is might be a bc break, so best would be to check it carefully.

@Guite
Copy link
Collaborator

Guite commented Nov 22, 2024

Yes, I am going to do some tests next week 👍

@marckoehler
Copy link
Author

I'm surprised that such issue wasn't found before. It makes sense! thanks @marckoehler for digging that deep in the rabbit hole!

My pleasure! Turned out to be quite the rabbit hole! According to https://www.oracle.com/technical-resources/articles/srivastava-namespaces.html the code still needs some work with 'no namespace' and 'qualified vs. unqualified' elements and attributes. Since I'm now so deep in, I will probably make some more PRs for this when I have time.

@marckoehler I'm not aware of the reason for you working with this library, but do you know https://github.com/goetas-webservices/xsd2php ? is there any chance you could run your changes against that project and see if something breaks?

I'm currently trying to get phpro/soap-client to work with my project. This has a dependency to php-soap/wsdl-reader which lead me to this project. After applying the change, phpro/soap-client successfully generates PHP classes from my wsdl (not public unfortunately so i can't share it here). So it seems that wsdl-reader and soap-client still work after applying this change.

This makes sense because the change really only happens during the schema validation process when the schema object is being populated. The actual schema object itself is still populated just like before. No changes here. And since this object is the one being used by other projects, I would only expect this to break on schemas that should not have been considered valid in the first place. Of course, this still means that there could be people whoose xsd files don't work anymore with this reader.

So much for the theory... I'll take a look at xsd2php with this change applied and see if somethig breaks with the xsd files I have. 👍

@Guite
Copy link
Collaborator

Guite commented Nov 22, 2024

I would only expect this to break on schemas that should not have been considered valid in the first place. Of course, this still means that there could be people whoose xsd files don't work anymore with this reader.

Interestingly, I found only some errors yet for xsd files that had been converted from dtd files and are therefore quite dirty. Still working through them.

Seems your changes cause some hardening that is really good for the overall behaviour of this library 👍

Copy link
Collaborator

@Guite Guite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, thank you very much @marckoehler

Please check the cs-fixer issues at https://github.com/goetas-webservices/xsd-reader/actions/runs/11890555186/job/33371113208?pr=87 to make it green.

@veewee
Copy link
Contributor

veewee commented Nov 22, 2024

Nice work @marckoehler ! I've been bitten by this behaviour as well in the past, but thought it was just me not being able to craft a good XSD file :)

I've ran the change on my collection of WSDLs and noticed (only) one failing with this specific change, which used to work without the patch:
protel.wsdl.zip

In SchemaReader.php line 1131:

  Can't find type named {}#HTNG_PaymentCardType, at line 39357 in protel.wsdl

Since you are using phpro/soap-client, you can get the same error by running:

vendor/bin/wsdl inspect protel.wsdl

(It is an old flattened file of this service which now works fine, so the issue might lie in either the old version of the WSDL or a bug in the flattening process that has been resolved)

(It's a rather big service - I might free up some time next week to dig into why it crashes specifically in this case.)

I did run my testsuites on this change as well, which were all valid. However, I did not run this change against actual code generation. So I'm not sure this will break things downstream at this moment.


The tests also had to be adapted to match the new logic. In general, I added the xmlns:ex=http://www.example.com/ attribute to the schema element to define the 'ex' prefix for the namespace http://www.example.com/, which was the targetNamespace for all the tests that needed changing. I then also had to reference the components defined in the schemas by prefixing them with the 'ex' prefix since I did not provide a default namespace, and otherwise we would be in the edge case above. Lastly, I added 2 new tests to check if the default namespace is used correctly and that an unknown namespace to the schema fails the validation.

Changing all these tests seems like a bad idea to me. The tests should end up with the same result as before, just not for the cases where they are supposed to fallback to the default XSD value. Or maybe I am not fully grasping what you are explaining above. It's probably a better idea to deal with the edge case in here to make sure this package keeps on providing valid XSD information as it would before?

Copy link
Member

@goetas goetas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marckoehler I'm good with merging your changes. From what I can read seems that is overall an improvement, and if it breaks something probably is "for good".

Just please address the @Guite 's suggestion regarding code style.

After that, it should be good to go!

@goetas
Copy link
Member

goetas commented Nov 22, 2024

@veewee

Changing all these tests seems like a bad idea to me. The tests should end up with the same result as before, just not for the cases where they are supposed to fallback to the default XSD value. Or maybe I am not fully grasping what you are explaining above. It's probably a better idea to deal with the edge case in here to make sure this package keeps on providing valid XSD information as it would before?

The issue that @marckoehler discovered was even unknown to me, so my tests were testing the wrong behavior. With the changes he did, now tests are written according to specs.

@marckoehler
Copy link
Author

@veewee The wsdl in your error is missing a default namespace while also not using namespaces to qualify its types and refs, which is not valid xsd. This is exactly what the tests looked like after my change so like @goetas said all of them had to be rewritten since they were failing on a wrong basis. That's why I changed them to work with the new logic.

My XSD files also work with xsd2php after applying the changes there, so that's good.

The cs fixer changes have also been applied.

@veewee
Copy link
Contributor

veewee commented Nov 23, 2024

Thanks for looking into it and explaining @marckoehler ! I've had some namespace bugs whilst flattening in the past, so it's probably that then.

Looks good to me!

@goetas goetas merged commit 0766af5 into goetas-webservices:master Nov 23, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants