Skip to content

Yiddish pulp fiction (Shund) transcription/translation

Notifications You must be signed in to change notification settings

Cook4986/Shundlikht

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 

Repository files navigation

Shundlikht

Shundlikht is a Python program (Jupyter Notebook) that automatically transcribes, translates (if you want), and collates the text embedded pdf images associated with each installment of a given work in the Shund.org database. The images associated with each installment are hosted by the National Library of Israel.

Dependencies

Usage

  1. Create local directory, named after the target Shund.org work
  2. Export Shund.org search results associated with work (See: image, below)
  3. Place exported CSV in the working directory
  4. In ipynb "Globals":
  • Set Google Application Credentials
  • Set "workDir" to working directory pathname
  1. "Run All" cells

TO-DO

  • Annotate filepaths with language code extension, if target language differs from source
  • Add spelling correction

Caveats

  • Runs locally
  • NLI may attempt to block automated requests
  • Translation may have errors (in addition to transcriptions)
  • GCT (Google) is black-boxed

Matt Cook - 2023

About

Yiddish pulp fiction (Shund) transcription/translation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published