-
Notifications
You must be signed in to change notification settings - Fork 5
Improve information obtainable from Dataset version
property
#230
Comments
Asdetailed in #230, there are a number of issues with the current behavior of dataset.version. This provides a clunky workaround in a situation where the improved behavior would be beneficial.
I disagree. in the case where no version is specified, the Standard specifies that |
So it does! (and it's mandatory at Have edited the main post to fix this. |
After some internal team discussion on the merits of making this change, I am now inclined to support this change as it is more pythonic yet still allows access to invalid data from elements, albeit it is required to parse an error exception object. However, it would be good to get the feedback from data users who:
Therefore, @andylolz @markbrough @akmiller01 - do you have any thoughts? This change will likely set a precedent for the way in which we extend pyIATI in future to normalise IATI data (i.e. significantly reduce the complexity required to handle data in versions |
So first thought is that it would probably be useful to handle the case of a dataset having multiple versions. For example, the Datastore returns the I don't think 8 should return an error, I think it should return a list of multiple versions. This might be a bit off topic though, because I am not really sure what you would do with the result of a Dataset? Is this something I would get out of the Datastore, or just if I am trying to validate a single package on the Registry? (I am not being facetious, I am just unclear what happens with the output of this) |
+1 to @markbrough comments. Had my first experience with pyIATI this week; just some simple testing to get used to it. I'm not really sure what value the validation is adding when parsing something like output from the Datastore. Each activity has a different version, but because the
And beyond that, the |
The IATI Standard itself only permits a single version to be specified - something like the contents of the Datastore's
The Datastore does not output valid IATI XML. As such, it is necessary to convert it into a version that is valid IATI XML before pyIATI will handle it in the manner that is likely desired.
A |
This does not appear to be true for version 1. Also, there is clearly a use case for this in derivative datasets or third-party tools even if this is not something that publishers should be doing in their own data.
So yeah, maybe the confusion stems from the fact that I am not really sure what I am supposed to use this for. Validating real-world data coming out of real-world tools seems within scope? Why would I want to convert the data from the Datastore, just to pipe it through pyIATI? I am not sure where that gets me. Or does the Datastore output need to be corrected to output valid IATI XML? Again though maybe I am missing the point of pyIATI. |
More specifically, a data file may only be a single version - should multiple be specified, the combination rules must be followed to determine the single version that the data is specified at.
Aye, hence inheritance and custom namespaces are things.
Pretty much this. The XML output from the Datastore (or any unmodified subtree at a scope greater than |
Are we talking about the organisation standard here, too? For |
Until this thread I had no idea that in 1.0x, It seems like in practice, it’s pretty widely omitted, which would break these combination rules you’ve created. About 139 publishers get it wrong for activity files, compared with only 15 publishers getting it right. The numbers are worse for org files – something like 165 publishers get it wrong; just 6 get it right. So if you implement this as proposed, the So… I’m not at all sure about these combination rules. I absolutely agree that the 1.0x standard is really unclear (and arguably wrong) here – but I’m not sure the solution is to rigidly follow what’s (not) there. After all, very few publishers are following it. So I’m not sure the value in retrospectively enforcing rules that have not been codified before. I guess that leads me to make the same comment others have made above – that it really depends what your usecases are for pyIATI. I gather you’re after a to-the-letter implementation of the standard as it stands… My sense is: the data that exists in the registry doesn’t follow that. |
More evidence of the mystical v
At the moment, it's deemed that a pyIATI Dataset contains data at a single version. Should there be data at multiple versions, multiple pyIATI Datasets should be created... which seems like something that a utility function could help with.
To be honest, nor am I. @dalepotter knows more about them than I do, though it seems like a reasonable way to work within the above design concept until a reasonable validation-based approach is implemented (current thought: validate the data against a bunch of versions, the one with the fewest errors is probably correct; maybe mixing in something about seeing at what version the newest element or attribute was added). |
An
iati.Dataset
has aversion
property that specifies the version of the Standard that the Dataset is specified against.At present, there are a range of different situations that may occur:
1.01
, [...],2.02
)1.01
, [...],2.02
)1.01
)1.0x
combination rules are followed, leading to something detectable (1.01
, [...],1.05
)1
,17.6
,-19.4
, etc)bob
,jim
,one point zero one
, etc)1
,17.6
,-19.4
,bob
,jim
,one point zero one
, etc)1.0x
combination rules are followed, leading to no clear answer (None
)With these cases, at present:
This means that:
This is a mess.
As such, the property should be improved such that:
iati.validator
1.0x
combination rulesiati.exceptions
that inherits fromValueError
and has an attribute containing the located data.1.0x
combination rulesThis proposal would change use of this property from:
to:
This alternative is more Pythonic, provides more information at the point of failure, and simplifies the potential return values.
It also generalises far better to other pipleline work (#187) where the current code would look far clunkier (due to a lack of relevant-constant).
The text was updated successfully, but these errors were encountered: