MorphoBrExpansion

An expansion of MorphoBr data through modeling of four word-formation processes by suffixation

Author: Hélio L. B. Silva [email protected]

License: GNU General Public License Version 3 (https://www.gnu.org/licenses/gpl-3.0.txt)

How to cite this work: SILVA, H. L. B. Expansão do MorphoBr através da modelagem computacional de processos de formação de palavras em português. 2019. Dissertação (Mestrado em Linguística) - Programa de Pós-Graduação em Linguística, Universidade Federal do Ceará, Fortaleza, 2019.

MorphoBr data is available at https://github.com/LFG-PTBR/MorphoBr

The suffixes used are -izar, -idade, -vel and -mente. From MorphoBr we extracted non-hyphenated lemmas and used them to feed four word-formation processes. The four resulting base files are the following: adjectives.lemas, adverbs.lemas, noun.lemas and verbs.lemas.

The file v1.lemas was created by extracting the first conjugation verb lemmas from verbs.lemmas in order to provide base forms for word-formation process by suffixation of -vel.

The following files were created by suffixing adjective base forms with -idade, -izar and -mente suffixes: adjIDADE.lemas, adjIZAR.lemas and adjMENTE.lemas.

The file adjICO.lemas was created by extracting all adjectives suffixed by -ico in order to remove their diacritics separately. The following files were created by suffixing them with -idade, -izar and -mente suffixes: adjICAMENTE.lemas, adjICIDADE.lemas and adjICIZAR.lemas.

The file adjICIDADE-Duplicadas.lemas was created as an error during the process of removing diacritics. This file contains double entries, because of the difference between European Portuguese and Brazilian Portuguese ("tónico", "tônico"). The file adjICIDADE.lemas was created after removing the double entries.

The files subsÇÃO.lemas and subsMENTO.lemas were created by extracting from nouns.lemas the nouns suffixed by -ção and the nouns suffixed by -mento, respectively.

Finally, the following files contain the words generated by our process: novosadjetivos.lemas, novosadverbios.lemas, novossubstantivos.lemas and novosverbos.lemas.

Each of the Transducer folders contains the one of the following types of files: build-fst.xfst, nPoS.lemas and nPoS.lexc

Adjectival and Verbal Transducer folders also contain their regras.xfst files.

To generate the new forms from the files "build-fst.xfst" on Linux, open the terminal, navigate to the directory where the file is, call the compiler xfst, then write the following command and type enter:

source build-fst.xfst

The next step is to print all the forms from the transducer we have just created. To to this, write the following command and type enter:

print words > newwords.dict

The files ".dict" contain the pairs of inflected forms and PoStagged forms, following MorphoBr's structure.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
Adjectival Transducer		Adjectival Transducer
Adverbial Transducer		Adverbial Transducer
Nominal Transducer		Nominal Transducer
Verbal Transducer		Verbal Transducer
Word Formation		Word Formation
Adj.dict		Adj.dict
Adv.dict		Adv.dict
README.md		README.md
Subs.dict		Subs.dict
Verbs1.dict		Verbs1.dict
Verbs2.dict		Verbs2.dict
gpl-3.0.txt		gpl-3.0.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MorphoBrExpansion

About

Releases

Packages

Languages

lucasrct/MorphoBrExpansion

Folders and files

Latest commit

History

Repository files navigation

MorphoBrExpansion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages