-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Structure of Aggregated XML Output #3
Comments
Hello Asbjoern, I am not quite sure I understand your questions. However, here is what I was able to determine.
However, there is not agreement within the section for .xslx files.
This conflict can often be resolved by the "normalization" of tool output within the FITS code as has been done for other file formats. By doing this you would be able to see an However, what I see as a larger problem is that there is no metadata output for any files in this corpus as seen in the empty Sincerely, |
Hello David Happy New Year!!! Thank you for explaining the normalisation possibility. We have been wondering why the different tools provided different mimetype and PUID of the .xlsx format. Also the metadata section would be nice to have but we cannot dedicate the resources either. We had actually not noticed yet that it was missing. :-) I also now understand your difficulties with our previous issue. Because: We provided you with a spreadsheet of output data that did not include the new -a FITS analysis! I have embedded the updated file. It opens the FITS sheet as default. In the sheet you can see that the analysis of the first file is spread over the first 28 rows rather than just include all info from the same columns in one row. Do you see? What do you think? Regards |
Hello Asbjørn, If I understand correctly, your concern is about why the FITS output for a single file is appearing in multiple rows as seen in the spreadsheet you supplied. If so, I’m afraid I have no explanation for this. Though I’m a software developer I have no knowledge or experience about importing XML into an Excel spreadsheet. If your concern is more about the FITS output then please explain further. Thank you, |
Hello Dave
First of all, thank you so much for this quick solution for an aggregated XML output file (fitscollection.xml). That's simply amazing.
We did some tests with the 1.5.1-SNAPSHOT today on our batch of files "Excel test korpus" which we have created to analyze and convert Excel files as part of an ongoing investigation on whether to accept spreadsheets in our collections in other formats than the current standard format TIFF.
The -a function does exactly what we need it to do. We now have singular xml output to import into our analytical program (Excel) and in this file we compare the outputs of different identification, characterization and validation software including so far FIDO, JHOVE, DROID, Siegfried and FITS.
However these other programs output a singular output file (csv or xml) with each analyzed file corresponding to one row in our imported Excel. This differ from FITS which output 27 rows per analyzed file. This leads us to believe there is something in the XML schema or structure that perhaps suited well for the analysis of a single file with a single XML output but it does not suit very well for a singular XML file with multiple analyzed files in it.
What do you think of this? Is it simply us that misunderstands something?
We have attached our XML output file and our imported Excel file based on the "Excel test korpus". Be aware that GitHub does not allow the attachment of XML files so we changed extension to .txt but you can change back. I presume that's a completely harmless conversion.
Looking forward to reading your response.
Regards Asbjoern
FITS_testlog.txt
_Resultater.xlsx
The text was updated successfully, but these errors were encountered: