Skip to content

Latest commit

 

History

History
18 lines (10 loc) · 542 Bytes

README.md

File metadata and controls

18 lines (10 loc) · 542 Bytes

EpubToTxt

Gets all the text content from an epub and saves it as txt.

To run it :

go run main.go -epub=<epub file> -regex=<regex file> -output=<output directory>

A regex file can be added to to replace certain parts of the epub content.

The i-th(first) line in the regex file defines the regex to match and i+1th(second) line defines what to replace the matched regex with.

I wanted to remove everything between <rt> tags because they mess up my text parsing software so my regex file looks like this:

<rt>(.*?)</rt>