Shundlikht is a Python program (Jupyter Notebook) that automatically transcribes, translates (if you want), and collates the text embedded pdf images associated with each installment of a given work in the Shund.org database. The images associated with each installment are hosted by the National Library of Israel.
- Create local directory, named after the target Shund.org work
- Export Shund.org search results associated with work (See: image, below)
- Place exported CSV in the working directory
- In ipynb "Globals":
- Set Google Application Credentials
- Set "workDir" to working directory pathname
- "Run All" cells
- Annotate filepaths with language code extension, if target language differs from source
- Add spelling correction
- Runs locally
- NLI may attempt to block automated requests
- Translation may have errors (in addition to transcriptions)
- GCT (Google) is black-boxed