Skip to content

Latest commit

 

History

History
24 lines (16 loc) · 682 Bytes

File metadata and controls

24 lines (16 loc) · 682 Bytes

Abstractive Text Summarization

The Algorithm

http://arxiv.org/abs/1509.00685

Our implementation differs in that we fix the context and summary token of the embedding matrix.

-Both embedding matrices are initialised from GloVe

Helptext:

python3 main.py -h

The training datasets are under data/. Each JSON file contains three fields title, full_text, summary. They're downloaded with scripts in download_data/.

GloVe data needs to be downloaded and unzipped under glove/. The code uses the first 10k most frequent tokens by default. To generate the embeddings for them,

cd glove
head -n 10000 glove.6B.300d.txt >glove.10k.300d.txt