Skip to content

Latest commit

 

History

History
21 lines (14 loc) · 565 Bytes

README.md

File metadata and controls

21 lines (14 loc) · 565 Bytes

Bookbinder & Antiquarian

Text cleanup.

Bookbinder.py is regex; Antiquarian.py is AI cleanup. openai_batch_api/batch.ipynb is AI cleanup in batches and preparing finetune files.

How to

Because every project has different requirements, this needs to be manual every time.

  1. Edit config files
  2. Edit python scripts
  3. Run scripts

TODO

  • fix cost estimation to estimate just the file that was generated
  • fix batch notebook to work when the batch job uses several files.
  • refactor
  • Add segmentation.
  • Add deferral to save on compute