Skip to content

Latest commit

 

History

History
192 lines (142 loc) · 5.1 KB

README.md

File metadata and controls

192 lines (142 loc) · 5.1 KB

Brazilian Name Generator

Create cool and awkward names with Language Models!

Table of Contents

  1. About The Project
  2. Getting Started
  3. Usage
  4. License
  5. Contact
  6. Acknowledgements

About The Project

Predicted the name: RUNNATAIDENILOS
Prefix: RU
Context Size: 2
Seed: 1

Language Models are tasked with assigning a probability to a word or even a sentence. They correct the misspelled words you type on your cell phone, as well as help your personal assistant to understand you.

In this fun project, I used them to make a probabilistic model of the characters of Brazilian names using data from the 2010 census. Then, I used these models to generate new names.

It works by guessing next letters based on the previous ones. For instance, what is the most probable name given that the name starts with Pau...? For the English language it will probably be Paul, while for Portuguese it will be Paulo. However, if we use a small enough context size (e.g., number of previous letters to infer the next one), awkward and cool names start to appear =)

Built With

Getting Started

You can use this project with docker or install locally in your machine

Prerequisites

  • Docker

or

Local Installation

  1. Clone the repo
    git clone https://github.com/renan-cunha/NameGeneratorBR
    cd NameGeneratorBR/
  2. Create environment
    make create_environment
    conda activate NameGeneratorBR
    
  3. Install requirmeents
    make requirements
    

Usage

The repo has five trained models, from context size equal to 0 (e.g., the next letter is predicted by how much it appears in the dataset) to 4 (e.g., the previous four letters are used to infer the next one).

Generate New Names

If you want just to generate a new name, use the src/models/predict_model.py with the following options:

Usage: predict_model.py [OPTIONS]

Options:
  -cs, --context_size INTEGER  How much context to use for the language model,
                               The pre-trained models go from 0 to 4
  -p, --prefix TEXT            The beginning of the name to be predicted (OPTIONAL)
  -s, --seed INTEGER           Seed to reproduce experiments (OPTIONAL)
  --help                       Show this message and exit.

Ex:

(NameGeneratorBR) renan@DESKTOP-AD25DOI:~/git/NameGeneratorBR$ python src/models/predict_model.py -cs 4 -p pau -s 0
Predicted the name: PAULO
Prefix: PAU
Context Size: 4
Seed: 0

Reproduce Training

To reproduce the training, use the command below

make train_model

Docker

Pull the image

docker pull renancunha97/name-generator-br

And make new names

renan@DESKTOP-AD25DOI:~$ docker run renancunha97/name-generator-br -cs 4 -p pau -s 0
Predicted the name: PAULO
Prefix: PAU
Context Size: 4
Seed: 0  

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Renan Cunha - [email protected]

Acknowledgements

If you are curious about Language Models and Natural Language Processing in general, I highly recommend Jurafsky's drafts of Speech and Language Processing 3rd edition and his classes.