latin_dict

PANICBUTTON

Overview

This is a simple program designed to sort the Cambridge Latin Book III dictionary into a CSV file. Possible implementations include importing to Anki (CSV files created by this project can be directly imported into Anki).

How It's Formatted:
Each entry is a CSV row containing the following values:

Entry index (zero-based line numbering)
Latin word
Gender
English definition
Index of referenced entry (see below)

How It Works:
The Cambridge Latin Book III dictionary has all dictionary values stored in a massive string on line 178 of the page's source. Individual entries are delimited by carets (^), and words are separated from their definitions by dollar signs ($).

The program creates a list for each entry, comprised of the entry's initial components (Latin word and definition). Each entry list is then nested in the global entries list.

Some noun entries have genders, denoted with "m." for masculine, "n." for neuter, or "f." for feminine. The program locates these and migrates them from the Latin word cell to the gender cell.

Some definition values reference other entries in the dictionary, rather than providing their own definitions. These definitions are formatted as "{see} [referenced_entry_word]". The program locates these referenced entries and inputs their Latin word and definition values into the definition cells of the entries that reference them. It also adds the index of the referenced entry to the entry that references it.

Additional Formatting:

Some entries are preceded by asterisks (*). The program removes these.
Some entries label genitive forms of Latin words with "{gen.}". The program preserves the genitive label, but removes the curly braces.
Some entries include plural forms of words, labeled with "pl." These are given a leading space for readability.
Some entries contain percent signs (%), originally used to denote sub-entries. As of now, these are simply removed, however, future commits of latin_dict may implement sub-entry denotation and formatting.
Entries with multiple words in Latin word and/or definition cells were originally comma delimited. To prevent CSV formatting issues, these commas are replaced with hyphens (-) during the code's runtime. However, they are re-replaced with commas in the final CSV output.

License

GNU General Public License

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.idea		.idea
output		output
LICENSE		LICENSE
main.py		main.py
readme.md		readme.md
str.txt		str.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

latin_dict

PANICBUTTON

Overview

License

About

Releases

Packages

Languages

License

PANICBUTT0N/latin_dict

Folders and files

Latest commit

History

Repository files navigation

latin_dict

PANICBUTTON

Overview

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages