Skip to content

A Python3 program to identify the author of an unknown text. This is done by analysing the charactertistic ngram frequencies of the authors' works in the training set, and the matched to data in the test set.

License

Notifications You must be signed in to change notification settings

sidchaini/AuthID

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AuthID

A Python3 program to identify the author of an unknown text. This is done by analysing the charactertistic ngram frequencies of the authors' works in the training set, and the matched to data in the test set.

Prerequisites

You need python3 and nltk installed. Further, there should be 2 directories - "Train Data" and "Test Data", present in the directory where the py file is located.

For example, for 3 authors, the directory structure is as follows:

├── AuthID.py
├── Train Data/
│   ├── Author#1/
│   │     ├──── known_text_1.txt
│   │     ├──── known_text_2.txt
│   │     └──── known_text_3.txt
│   │
│   ├── Author#2/
│   │     ├──── known_text_1.txt
│   │     └──── known_text_2.txt
│   │
│   └── Author#3/
│         ├──── known_text_1.txt
│         ├──── known_text_2.txt
│         ├──── known_text_3.txt
│         └──── known_text_4.txt
│    
└── Test Data/
    ├── unknown_text1.txt
    ├── unknown_text2.txt
    ├── unknown_text3.txt
    └── unknown_text4.txt

Running it from command line

For Linux/Unix:

python3 AuthID.py

For Windows:

python AuthID.py

Github Link

https://github.com/siddharthchaini/AuthID/

License

MIT

Authors

Siddharth Chaini & Siddharth Bachoti

IISER Bhopal

About

A Python3 program to identify the author of an unknown text. This is done by analysing the charactertistic ngram frequencies of the authors' works in the training set, and the matched to data in the test set.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages