AuthID

A Python3 program to identify the author of an unknown text. This is done by analysing the charactertistic ngram frequencies of the authors' works in the training set, and the matched to data in the test set.

Prerequisites

You need python3 and nltk installed. Further, there should be 2 directories - "Train Data" and "Test Data", present in the directory where the py file is located.

For example, for 3 authors, the directory structure is as follows:

├── AuthID.py
├── Train Data/
│   ├── Author#1/
│   │     ├──── known_text_1.txt
│   │     ├──── known_text_2.txt
│   │     └──── known_text_3.txt
│   │
│   ├── Author#2/
│   │     ├──── known_text_1.txt
│   │     └──── known_text_2.txt
│   │
│   └── Author#3/
│         ├──── known_text_1.txt
│         ├──── known_text_2.txt
│         ├──── known_text_3.txt
│         └──── known_text_4.txt
│    
└── Test Data/
    ├── unknown_text1.txt
    ├── unknown_text2.txt
    ├── unknown_text3.txt
    └── unknown_text4.txt

Running it from command line

For Linux/Unix:

python3 AuthID.py

For Windows:

python AuthID.py

Github Link

https://github.com/siddharthchaini/AuthID/

License

MIT

Authors

Siddharth Chaini & Siddharth Bachoti

IISER Bhopal

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
Test Data		Test Data
Train Data		Train Data
AuthID.py		AuthID.py
LICENSE		LICENSE
Project Report.pdf		Project Report.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AuthID

Prerequisites

Running it from command line

Github Link

License

Authors

About

Releases

Packages

Languages

License

sidchaini/AuthID

Folders and files

Latest commit

History

Repository files navigation

AuthID

Prerequisites

Running it from command line

Github Link

License

Authors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages