You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Maybe I missed it, but I couldn't tell if the columns in a CSV file one is checking must come in the same order as they are listed in the body of a CSV schema.
Assuming that the prolog does not specify the column count, is it acceptable to have additional columns that do not match a column entry in the body, and have them just be unchecked?
I am interested in using the validator for some scientific data where there is a known set of columns that should be checked for reasonable contents, but where I'm not sure that the ordering of columns will be consistent, and where some data providers might have added additional columns of computed values to the raw values that my schema should check.
Thank you
The text was updated successfully, but these errors were encountered:
The ordering must match. Using the totalColumns directive means that the validator checks that there are the expected number of column definitions given at parse time. If you do not specify it there will still be a validation error once the CSV file is actually read if the number of column definitions does not match the number of columns in the file.
There are some similar issues already #21 and #13, but I'm afraid we've not had resource availableto work on further developments recently, though we would welcome pull requests from others.
I suggested making the order optional because CSVs are often interpreted by tools like python's Pandas, in which the columns are name-addressable, so column ordering is not required for correct operation.
And I mentioned in my original comments that for scientific data there are often additional columns of derived quantities added that don't interfere with correct (assuming name-based addressing) processing of the data.
I imagine that these additional features could add substantially to the difficulty of validation, though.
Maybe this should be tagged as "question-edging-into-enhancement-request"!
I am interested in using the validator for some scientific data where there is a known set of columns that should be checked for reasonable contents, but where I'm not sure that the ordering of columns will be consistent, and where some data providers might have added additional columns of computed values to the raw values that my schema should check.
Thank you
The text was updated successfully, but these errors were encountered: