You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The lexical basis for WFL is the same as that of the morphological analyser and lemmatiser for Latin **Lemlat** which has been collated from three Classical Latin Dictionaries: _Oxford Latin Dictionary_ \(Glare 1982\); _Ausführliches lateinisch-deutsches Handwörterbuch_ \(Georges and Georges 1913-18\); _Laterculi vocum latinarum_ \(Gradenwitz 1904\). It contains 40,014 lexical entries and **43,432 lemmas** \(as more than one lemma can be part of the same lexical entry\). Additionally, the lexical basis of Lemlat has recently been enriched with the integration of most of the _Onomasticon_ \(26,250 lexemes out of 28,178\) contained in the Forcellini lexicon \(Budassi and Passarotti 2016\).
Lemlat contains every string of characters required in the inflectional paradigm of each lexeme, like the uninflected parts of irregular supines \(_duc_-, _duct_- for _duco_ ‘to lead’\), or the stem of the genitive of imparisyllaba nouns and adjectives \(_crimen_, _crimin_- ‘accusation’\), fundamental for the automatic processing of WFRs, as well as including graphical variants, like _obf_-/_off_- in _offero_ ‘to put oneself forward, cause to be encountered’ \(Passarotti and Mambrini, 2012\).
These strings of characters are used by Lemlat while morphologically analysing and lemmatising input word forms, which are automatically segmented into formative elements. Among these, the lexical element is called **les** \(for “LExical Segment”\). This is the invariable part of the inflected forms, i.e. the sequence – or one of the sequences – of characters that remains the same in the inflectional paradigm of a lexeme. The **les** does not necessarily match the word stem. For example, _poetis_ the **les** for the lexeme _poeta_ ‘poet’, as it is the sequence of characters that does not change in the different forms of the lexeme _poeta_: _poet-a_, _poet-ae_, _poet-am_, _poet-ae_, _poet-arum_, _poet-as_, _poet-is_. Lemlat includes a **les** archive, in which each **les** is assigned a number of inflectional features. Among these, there is a tag for the gender of the lexeme \(for nouns\) and a code \(**codles**\) for its inflectional category. For instance, the **codles** for the **les** _poet_ is **n1e** \(first declension irregular nouns\) and its gender is **m** \(masculine\). In the case of irregular nouns, as for _poeta_, there is also a field \(**lem**\) containing information on how to recognise an irregular ending \(_poeta_ can sometimes appear with nom. sg. _poetes_\) during the lemmatisation.
WFL makes use of the **les** archive together with a list of **43,432 lemmas** automatically extracted from the Lemlat dataset. Both lists were added as tables to the relational database used while building WFL.
For more details about Lemlat and its lexical basis please refer to Passarotti et al. 2017 \(in bibliography\).
In the WFL lexical basis, there are three codes for three different kinds of lemmas:
1. B for "basic": lemmas taken from the original Lemlat lexical basis.
2. O for "onomastic": lemmas added from the Forcellini onomastic lexicon. These have not been used to build word formation relations, unless they are the input for a non onomastic lemma, e.g. _antonius_ 'Anthony' > _antonesco_ 'to behave like Anthony'.
3. F for "fictional": fictional lemmas that have been added during the building of WFL to account for relationships that are otherwise difficult to fit in the WFL morphotactic database \(see Budassi and Litta 2017\). Fictional entries are marked with a preceding asterisc \('\*'\) that indicates a "reconstructed" lemma, although the lemma does not necessarily need to have ever existed, but it only acts as a _trait d'union_ between two attested lemmas.
****