Skip to content

Latest commit

 

History

History
390 lines (265 loc) · 9.66 KB

RELEASE_NOTES.rst

File metadata and controls

390 lines (265 loc) · 9.66 KB

Release notes

Version 0.77

  • LibreOffice 24.8 is used by default if available to fix false negatives with some MS Office files

Version 0.76

  • LxmlScraper will now consider XML files with US-ASCII encoding declaration valid if --charset=UTF-8 parameter was used
  • Improve scraper scrape-file help text
  • Add GHOSTSCRIPT_PATH configuration field to run externally packaged Ghostscript by default
  • RPM package now supports Ghostscript 10.03.1, which fixes some PDF files from being erroneously detected as invalid
  • Fix verapdf in $PATH always overriding VERAPDF_PATH configuration value

Version 0.75

  • Filter out XML-incompatible characters from scraper output.

Version 0.74

  • Check that av streams inside containers are supported with the specific container.
  • Improve processing for very large XML files
  • Update Epub version support from 3.2 to 3.3

Version 0.73

  • Identify some CSV files as text/csv instead of application/csv
  • Fix crash due to incorrectly detecting a ZIP file and attempting to parse it
  • Process certain text based Windows configuration files, identified by Magic, as plain text
  • Support newer version of ImageMagick than v6.9.12.88

Version 0.72

  • Loosen SEG-Y detection requirements when SEG-Y version declaration is missing
    • SEG-Y header with left-padded card numbers are allowed
    • SEG-Y header with card markers without numbers are allowed
    • SEG-Y header with C40 EOF. header EOF is allowed

Version 0.71

  • Replace VerapdfDetector with ExifToolDetector for detecting PDf/A files.

Version 0.70

  • Add Ghostscript's stdout to errors for invalid PDF files.

Version 0.69

  • Remove missing system path warning when importing file-magic/libmagic.

Version 0.68

  • Detect format version of ODF files correctly
  • Loosen SEG-Y detection requirements: empty SEG-Y header is now allowed.
  • Installation instructions for AlmaLinux 9 using RPM packages

Version 0.67

  • Add support for h265 (HEVC) video streams.
  • Update the following mimetypes:
    • audio/mp4 to audio/aac for AAC streams
    • video/mp4 to video/h264 for AVC streams.

Version 0.66

  • Fix a bug causing PDF files with warnings (but not severe errors) to be detected as not well-formed.
  • Add support for JP2 files.

Version 0.65

  • Support Apple M4A AAC files
  • The RPM package conflicts with ffmpeg-free, because the ffmpeg-free package does not have all the codecs file-scraper needs
  • Modernised Python source code with pyupgrade, some manual cleanups as well

Version 0.64

  • Fix a bug related to ffmpeg that caused validation to fail with some video files.
  • Remove some Python 2 remainders from the code.

Version 0.63

  • Make config file for excecutable paths
  • Json files are now detected as plain text
  • Add note to dummy_scraper.py on formatVersion not being supported with mimetype text/plain

Version 0.62

  • Change well-formedness results of the following scrapers, because they do not validate:
    • ExifTool Scraper
    • Magic Scraper
    • Textfile Scraper: TextfileScraper and TextEncodingMetaScraper

Version 0.61

  • Change well-formedness result of Wand Scraper, because it does not validate.
  • Increase stack size for Schematron compilation.
  • File magic version fix for CentOS7 installation.

Version 0.60

  • Add RHEL9 compatibility.
  • Change well-formedness result of PIL Scraper, because it does not validate.

Version 0.59

  • Update info message regarding PDF files.
  • Fix python2 warc-tools requirement in python3 spec file.

Version 0.58

  • Add grade for DPX version 1.0.

Version 0.57

  • Differentiate MPEG-1 PS and MPEG-2 PS containers.
  • Add support for multi-frame TIFF/PNG images.

Version 0.56

  • Add SEG-Y file format detection and grade it as bit-level file format.

Version 0.55

  • Python 2.7 support officially removed.

Version 0.54

  • Fix WMA and WMV file date rate detection.
  • Changed grading according to version 1.11.0 of DPS File Formats specifications.
  • Fix wrong script paths.
  • Add missing return code handling to multiple scrapers.
  • Fix color detection for specific WMV files.

Version 0.53

  • Add support for SIARD file format.
  • Add support for WMA and WMV file formats.
  • Fix issue where FFmpeg was run even though file format well-formed check was skipped.

Version 0.52

  • Add support for AIFF file format.

Version 0.51

  • Add support for DNG file format versions 1.1 and 1.2.

Version 0.49-0.50

  • Pin file-magic version 0.4.0 or less since newer version requires a newer libmagic than CentOS 7 ships by default.

Version 0.48

  • Make scraper functional with veraPDF older than 1.18. In older versions, .pdf file extension is required for the PDF files.
  • Fix veraPDF command similar to JHOVE command.
  • Handle possible errors found in file format detection properly.
  • Allow wand to deliver EXIF version as ASCII codes or plain text.

Version 0.47

  • Add test case for file-5.30 recursion bug

Version 0.46

  • Improve LxmlScraper's error handling.

Version 0.45

  • Fix scraper not being able to scrape PDF files that do not have .pdf file extension. This requires veraPDF 1.18 or newer.

Version 0.44

  • Update installation guide for Python 3.6 in README.rst.
  • Add DNG file format support.
  • Fix DV file format detection.
  • Update requirements in setup file.

Version 0.43

  • Add MPEG-4 version 2 (ISO/IEC 14496-14) video container support.

Version 0.42

  • Add support for JHove 1.24.1.
  • Fix bug in quicktime identification.
  • Add EPUB support to file scraper.

Version 0.41

  • Fix bug caused by wand trying to UTF-8 decode latin-1 Exif field values. WandScraper will not try to handle Exif field values that it does not use.

Version 0.40

  • Changed grading according to version 1.10.0 of DPS File Formats specifications
  • Changed the name ContainerGrader to a more precise ContainerStreamsGrader
  • Addeed quote character support for CSV files.

Version 0.39

  • Update version number in file_scraper/__init__.py

Version 0.38

  • Fix bug in detecting missing files when mimetype option was given

Version 0.37

  • Use LibreOffice 7.2 to scrape MS Office formats. This fixes stuck processes with certain MS Excel files.

Version 0.35-0.36

  • Minor fix in e2e tests.

Version 0.34

  • Changes in PDF scraping:
    • Both JHove and Ghostscript are now run for all PDF files, but the scraping results are ignored if the file is not supported by the tool.
    • Added PDF root version reporting to JHove scraper output
  • Select Python 2/3 version of dpx-validator depending on the current environment.
  • Added grades for files into the scraper output. The grade defines whether a file is recommended or suitable for digital preservation.
  • Well-formed result is unknown for non-supported file or stream formats.
  • MIME type is (usually) given even if there is no scraper implementation.
  • Added ProRes grading as bit-level format with recommended format.
  • Added video/avi support.

Version 0.33

  • Unknown text encodings are processed without failing
  • Forbidden characters set is expanded for ISO-8859-15 charsets
  • Better handling of local XML schema file paths

Version 0.32

  • Fix PDF version detection
  • Remove ARC file format support
  • Update PRONOM codes for file formats
  • Handle conflicts between scraper results in a new scraper
  • Update MS Office version handling

Version 0.31

  • Build el7 python3 rpms
  • Fix scraper CLI in python3

Version 0.30

  • Filter out unicode normalization warnings

Version 0.29

  • Fix illegal control characters being printed in scraper error messages
  • Minor fixes related to schema cleanup

Version 0.28

  • Fix accidental set-type value

Version 0.27

  • Build el8 rpms
  • Fix Fido caching bug

Version 0.26

  • Support for JPEG/EXIF files with older file magic library, tested with 5.11

Version 0.25

  • Support validation of XML files with relative path to local schemas

Version 0.24

  • Increase maximum CSV field size

Version 0.23

  • Fix colorspace value handling and add support for ICC profile name
  • Remove JPEG2000 from AVI and AVC/AAC from MPEG-1/2 PS to meet the current specifications
  • Support newer version of veraPDF

Version 0.22

  • FLAC stream support for Matroska videos added
  • MIME type update for LPCM streams
  • Wand memory leaking issues fixed
  • Filter unnecessary v.Nu warnings related to HTML5 validation
  • Distinguish JP2 and JPX files

Version 0.21

  • Add command-line interface

Version 0.20

  • Add key to info dict to contain used tools in scraping
  • Minor bugfix related to unavailabe file format version

Version 0.19

  • Raise maximum image size for PIL
  • Add support for images with grayscale+alpha channels

Version 0.18

  • Changed Wand and ImageMagick error messages have been updated to tests.

Version 0.17