This script takes a DROID csv export and analyses it to pick up common issues to mark for investigation. It will identify 8 different criteria which are listed below. The script will output a Microsoft Excel (xlsx) file which has a tab for each of the different criteria.
- Unidentified Formats – No identification in PRONOM, would be a cause for investigation and may require an update to PRONOM.
- Extension only identification – Only identified by extension which is a less certain method to Binary or Container, so may require further validation.
- Multiple IDs – Would be a cause for investigation and may require an update to PRONOM.
- Extension Mismatches – Has an extension which is not listed against the PUID identified. Would be a cause for investigation and may require an update to PRONOM.
- Compressed archival container formats (e.g. zips).
- Duplicate files – Identified by checksum, the script is set to expect a SHA256 hash to have been generated.
- Zero byte files.
- Formats not on the white list – This is generated by identifying file formats which have identified as PUIDs not listed in the attached white list. The white is a CSV format which can be edited to match requirements. The information flagged by Freud can be useful in surveying collections and informing any actions you decide are required. If Freud identifies any issues which you feel may require adjustments to PRONOM please email the [email protected] to let us know.
The script requires three files: freud.bat, freud.py and formats-whitlelist.csv. They should be kept in the same folder when running the script. The script requires Python 3 with the pandas module included. It has been tested using Python 3.7.5 and pandas 0.25.3 A version of python including pandas can be installed using Anaconda https://www.anaconda.com/download/ It has been tested on a Windows 7 and Windows 10 environment.
- Click on the freud.bat file to start the script.
- It will open a window with a prompt to enter the filepath of the DROID csv you want to convert. This can be either typed in or you can drag the file into the window to add the filepath. Once added press Enter. This will save an ouput of the file in the same folder which will be named as the original file but saved as an xlsx format ending in _freudreslts.xlsx.