This repository contains scripts used in the work of the US National Archives.
Use the issues page to suggest new scripts that others may be able to help with.
- 1940census.py - Transform 1940 Census metadata for inclusion in the National Archives Catalog (Python 2)
- amara.py - Transform Amara video transcriptions for addition to the National Archives Catalog (Python 2)
- combinexml-py2.py - Combine multiple XML files in a directory into single files of 75 MB or less (Python 2)
- csv-add-headers.py - Add headers to a CSV file (Python 3)
- csv-to-xml.py - Convert CSV to simple XML (Python 3)
- downloadurls-py2.py - Download all files from URLs listed in a text file (Python 2)
- downloadurls-py3.py - Download all files from URLs listed in a text file (Python 3)
- file-units.py - Convert file unit submission spreadsheet (CSV) into DAS-compliant XML (Python 3)
- ocr-jpg.py - Generate OCR data from JPG files (Python 2)
- Dependencies:
- pdf.py - Convert PDF documents to JPGs (Python 2)
- s3_file_list.py - Generate a CSV listing of files on S3 cloud storage (Python 2)
- rename.py - Rename file names by replacing specific characters (Python 3)