Skip to content

sbadirli/Open-Set-Authorship-Attribution

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 

Repository files navigation

Open-Set-Authorship-Attribution

Jupyter NOtebooks and Dataset link for the paper: Open Set Authorship Attribution towardDemystifying Victorian Periodicals. Accepted to ICDAR 2021, Switzerland. Authors: Sarkhan Badirli, Mary B. Ton, Abdulmecit Gungor, and Murat Dundar

Paper at: https://arxiv.org/abs/1912.08259

Brief Summary

In this paper, we took a pragmatic view of computational AA to highlight the critical role it could play in authorship attribution studies involving historical texts. We consider Victorian texts as a case study as many contemporary literary tropes and publishing strategies originate from this period. We demonstrated the strengths and weaknesses of existing computational AA paradigms. Specifically, we show that common English words are sufficient to a greater extent in distinguishing among writings of most renowned authors, especially when AA is performed in the closed-set setup. Experiments under closed-set assumption produced near perfect attribution accuracy in AA task involving 36 authors using only 1,000 most frequent words. The performance suffered significantly as we switch to the more realistic open-set setup. Increasing the vocabulary size helped to some extent and provided some interesting insights that would challenge results of manual indexing. Open-set experiments also open interesting avenues for future research to investigate whether authors with the same upbringing may develop similar word usage habits as in the case of Brontë sisters.

Prerequisites

The code was implemented in Python Jupyter Notebook. For the list of packages please see the requirements.txt. You may create a conda virtual environment to have a hassle-free experiment running.

Data

You can download the datasets used in the paper from Dropbox. The dataset contains Authors and their books in a seperate text files, vocabulary list, and most frequent 1000 words.

Experiments

Jupyter Notebooks are coming soon!

Contact

Feel free to drop me an email if you have any questions: [email protected]

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published