Skip to content

Latest commit

 

History

History
50 lines (40 loc) · 1.44 KB

README.md

File metadata and controls

50 lines (40 loc) · 1.44 KB

Smart News Text Embedding

This is an open ended NLP project to derive embeddings from free form text.

This project uses the Apache license, as is Google's default.

Installation

You will need Python 3.8 or higher to run this project. From the command line, run the following steps to install all dependencies after cloning this repo:

python3.8 -m venv bert_env
source bert_env/bin/activate
python setup.py install

Running the Code

We use this library to instantiate a BERT layer that can be used in Keras models with TF 2.2. To learn how to run prediction on saved models or train new ones, run python training/train_model.py --help for more information on the exact configurations you can pass in. The script currently uses news article data which is available from the New York Times API.

Directory Structure

Below is a birds-eye view of the directory structure of this project:

config/
  requirements_keras.txt
  requirements_keras_gcp.txt
data/
  get_nyt_articles.py
smart_news_query_embeddings/
  __init__.py
  models/
    __init__.py
    bert_keras_model.py
    two_tower_model.py
  preprocessing/
    __init__.py
    bert_tokenizer.py
    specificity_scores.py
  trainers/
    __init__.py
    bert_model_trainer.py
    bert_model_specificity_score_trainer.py
    two_tower_model_trainer.py
  tests/
    [all unit tests here]
training/
  train_model.py