Skip to content

Cleaner script to normalize knock's output EPUBs

License

Notifications You must be signed in to change notification settings

Pi2048/clean-epub

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

clean-epub

The excellent knock application by Benton Edmondson outputs EPUBs that seem to be DRM-free. However, if you run the application twice on the same ACSM file, the hashes do not match.

This script normalizes EPUB files, and it is specifically written to normalize the output files of knock. It strips away all the differences between different EPUB files for the same book.

Usage

./clean-epub.py -i input.epub -o output.epub

Details

In essence, it does this:

  • Create a temporary directory, and unzip the input EPUB into it
  • Set all the access and modification times for files and directories in the temporary directory to a fixed date
  • Zip the contents of the temporary directory into a new EPUB, without adding extra file attributes

I tested it on Ubuntu 20.04, with EPUBs bought from Bol(dot)com.

There are three reasons why you might want to test it more extensively before depending on it:

  • Ebooks from other sellers might contain more identifying details than just the timestamps and order of the files in the zip.
  • The zip implementation of another OS might not deterministically order the input files of the new zip file.
  • Your OS might track more file metadata fields than just access and modification times (not sure if this exists)

About

Cleaner script to normalize knock's output EPUBs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages