This repository contains code for embeddings, plots and results for our paper:
"Canonical Status and Literary Influence: A Comparative Study of Danish Novels from the Modern Breakthrough (1870–1900)" presented at NLP4DH at EMNLP 2024.
Some useful directions:
memo_canonical_novels/
the main folder contains the source code for the project, here you will find the makefile to create embeddingsnotebooks/
contains the notebooks used for the analysis,analysis.py
is the main notebook,tfidf_comparison.py
is the notebook used to compare the embeddings with tf-idf. Other notebooks contain sanity checks.figures/
contains the figures generated by the notebooksdata/
contains saved embeddings (.json) used for the analysis (and will contain generated embeddings if you generate them)
The dataset used is available at huggingface
Please cite our paper if you use the code or the embeddings:
@inproceedings{feldkamp-etal-2024-canonical,
title = "Canonical Status and Literary Influence: A Comparative Study of {D}anish Novels from the Modern Breakthrough (1870{--}1900)",
author = "Feldkamp, Pascale and
Lassche, Alie and
Kostkan, Jan and
Kardos, M{\'a}rton and
Enevoldsen, Kenneth and
Baunvig, Katrine and
Nielbo, Kristoffer",
editor = {H{\"a}m{\"a}l{\"a}inen, Mika and
{\"O}hman, Emily and
Miyagawa, So and
Alnajjar, Khalid and
Bizzoni, Yuri},
booktitle = "Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities",
month = nov,
year = "2024",
address = "Miami, USA",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.nlp4dh-1.14",
pages = "140--155"
}
├── LICENSE <- Open-source license if one is chosen
├── Makefile <- Makefile with convenience commands like `make data` or `make train`
├── README.md <- The top-level README for developers using this project.
├── data
│ ├── interim <- Intermediate data that has been transformed.
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
│
├── notebooks <- Jupyter notebooks.
│
├── pyproject.toml <- Project configuration file with package metadata for
│ memo_canonical_novels and configuration for tools like black
│
├── figures <- Generated graphics and figures to be used in reporting
│
├── requirements.txt <- The requirements file for reproducing the analysis environment, e.g.
│ generated with `pip freeze > requirements.txt`
│
├── setup.cfg <- Configuration file for flake8
│
└── src <- Source code for use in this project, making embeddings.
│
├── __init__.py <- Makes memo_canonical_novels a Python module
│
├── config.py <- Store useful variables and configuration
│
├── dataset.py <- Scripts to download or generate data
│
├── features.py <- Code to create features for modeling
│
├── modeling
│ ├── __init__.py
│ ├── predict.py <- Code to run model inference with trained models
│ └── train.py <- Code to train models
└── pooling.py <- Code to create average embeddings from raw embeddings
│
└── plots.py <- Code to create visualizations