- What is MeaLeon?
- How did MeaLeon come about?
- How it works
- Future Steps
- Requirements
- Running the Flask Application
MeaLeon is a machine learning powered dish and recipe recommendation tool. It asks a user to type a dish that they like and choose the cuisine type it is from. It then presents the five most similar recipes back to the user with links to their location on Epicurious.
There are a few origin stories for MeaLeon.
-
One of the creator's friends hates sauerkraut and kimchi, and expressed their disgust by saying "every culture has a gross pickled cabbage dish."
-
One day, the creator was making ground beef tacos but ran out of cumin. In desperation, he grabbed something that he thought contained mostly cumin and used that: garam masala powder. Taco Tuesday suddenly had an Indian twist despite keeping most of the ingredients the same.
-
The creator regularly cooks, but tends to make the same dishes over and over and wanted something/someone to help him make new dishes without buying new spices and herbs for single usage.
All of those combined to form a desire to make something that would help him find relationships between different cuisines and make new foods!
The database was created from a 2017 scrape of Epicurious.com. The recipes in this database have been cleaned and converted to a matrix that is used via natural language processing and machine learning. The ingredients for each recipe have been transformed into vectors via term frequency, inverse document frequency techniques using the toolkit provided by scikit-learn.
The app uses the user inputs to call upon the Edamam API and takes the top 10 recipes returned to create a query vector that is then compared to the database recipes via cosine similarity.
This particular branch (tfidf_master
) is intended to be the standard branch that is used for MeaLeon.
The database, from a foodie's point of view, could use more data.
-
The cuisine classifications do not dive into deep classifications for many cuisine types and this should be fleshed out.
-
In an effort to add more recipes to the database, other sites and recipes will be scraped, classified, and added to the database. If you know of any recipes or recipe sites, especially for the cuisines of Africa, Southeast Asia, Oceania, and South America, please help!
-
Other recipes should ideally contribute to under represented cuisines in the database.
For 2 and 3: Finding new recipes and adding to the database will be moved into a new repo. The libraries and scripts needed for that work is different and lighter weight. Moving the scraping to a new repo should allow for that work to be done on less powerful computing hardware and represents a new MVP.
-
A classification algorithm will need to be used to estimate the cuisine classifications of new recipes.
-
That algorithm will then need to incorporate the ability to take multi-label classifications: For example, an ideal classification for Dandan noodles should return Sichuan, Chinese, and East Asian for cuisine.
-
Other implementations of this would be interesting for home cooks. One example would be an Alexa skill or Google Home integration to display or read aloud the proposed recipe steps and ingredients via smart home devices.
-
After discussion with Karen Warmbein, I will attempt a model using Word2Vec with a CBoW architecture using a CBoW window of 1 word to see the difference. She suggested that the initial training would speed up as compared to TF-IDF and I wonder if the size of the deployed app would be smaller.
- Incorporate a better template for notebook files
- Refactor to use spaCy or stanza over NLTK
- Update to new git branch architecture: main with parallel dev branch, branch has sub parallels with features. Each feature branch starts with date_initials_feature
- Update heroku deployment as it is currently down (likely due to security vulnerability)
- Incorporate Data Version Control for database and relational data store for MLFlow
- Source for SQL database for MLFlow and also object store for Data Version Control
- Docker container alternate installation instructions/set up
- Update database of recipes
- Switch off of Edamam API or update knowledge of how it works since the results it returns have changed significantly.
This repo has switched to using Python 3.10 via ASDF and Python package management via Poetry.
If you've got both installed (INSTRUCTIONS PENDING), you can go into your Terminal/CLI, go to the base of this repository, and type
poetry install
to get the base installation working.
To run in a development environment (on your local computer)
(CHECK THESE INSTRUCTIONS)
export FLASK_ENV=development
env FLASK_APP=app.py flask run
To run in a production environment (used for deployment, but test it out locally first):
export FLASK_ENV=production
python app.py
If running via Docker, the CLI instructions I used that worked were
docker run -it --rm -p 5000:5000 awashingtonchen/mealeon_docker
The -p flag is needed to specify the port. The --rm flag is used to clean up the container afterwards (good for quick tests on development). The -it tag is for interactive and I used it initially because there are some bash commands inside the script to download NLTK if it isn't installed, or to access files inside the container.