memo-canonical-novels 📚

This repository contains code for embeddings, plots and results for our paper:

"Canonical Status and Literary Influence: A Comparative Study of Danish Novels from the Modern Breakthrough (1870–1900)" presented at NLP4DH at EMNLP 2024.

Useful directions 📌

Some useful directions:

memo_canonical_novels/ the main folder contains the source code for the project, here you will find the makefile to create embeddings
notebooks/ contains the notebooks used for the analysis, analysis.py is the main notebook, tfidf_comparison.py is the notebook used to compare the embeddings with tf-idf. Other notebooks contain sanity checks.
figures/ contains the figures generated by the notebooks
data/ contains saved embeddings (.json) used for the analysis (and will contain generated embeddings if you generate them)

Data & paper 📝

The dataset used is available at huggingface

Please cite our paper if you use the code or the embeddings:

@inproceedings{feldkamp-etal-2024-canonical,
    title = "Canonical Status and Literary Influence: A Comparative Study of {D}anish Novels from the Modern Breakthrough (1870{--}1900)",
    author = "Feldkamp, Pascale  and
      Lassche, Alie  and
      Kostkan, Jan  and
      Kardos, M{\'a}rton  and
      Enevoldsen, Kenneth  and
      Baunvig, Katrine  and
      Nielbo, Kristoffer",
    editor = {H{\"a}m{\"a}l{\"a}inen, Mika  and
      {\"O}hman, Emily  and
      Miyagawa, So  and
      Alnajjar, Khalid  and
      Bizzoni, Yuri},
    booktitle = "Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities",
    month = nov,
    year = "2024",
    address = "Miami, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.nlp4dh-1.14",
    pages = "140--155"
}

Project Organization 🏗️

├── LICENSE            <- Open-source license if one is chosen
├── Makefile           <- Makefile with convenience commands like `make data` or `make train`
├── README.md          <- The top-level README for developers using this project.
├── data
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for modeling.
│   └── raw            <- The original, immutable data dump.
│
├── notebooks          <- Jupyter notebooks.
│
├── pyproject.toml     <- Project configuration file with package metadata for 
│                         memo_canonical_novels and configuration for tools like black
│
├── figures            <- Generated graphics and figures to be used in reporting
│
├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
│                         generated with `pip freeze > requirements.txt`
│
├── setup.cfg          <- Configuration file for flake8
│
└── src                <- Source code for use in this project, making embeddings.
    │
    ├── __init__.py             <- Makes memo_canonical_novels a Python module
    │
    ├── config.py               <- Store useful variables and configuration
    │
    ├── dataset.py              <- Scripts to download or generate data
    │
    ├── features.py             <- Code to create features for modeling
    │
    ├── modeling                
    │   ├── __init__.py 
    │   ├── predict.py          <- Code to run model inference with trained models          
    │   └── train.py            <- Code to train models
    └── pooling.py              <- Code to create average embeddings from raw embeddings
    │
    └── plots.py                <- Code to create visualizations

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

memo-canonical-novels 📚

Useful directions 📌

Data & paper 📝

Project Organization 🏗️

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
data		data
figures		figures
notebooks		notebooks
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
requirements_notebooks.txt		requirements_notebooks.txt
setup.cfg		setup.cfg

License

centre-for-humanities-computing/memo-canonical-novels

Folders and files

Latest commit

History

Repository files navigation

memo-canonical-novels 📚

Useful directions 📌

Data & paper 📝

Project Organization 🏗️

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages