GitHub - akthom/ParatextsAndDocumentaryPractices: code, data and supplementary material for Weber & Thomer 2013

this directory contains code and files from
Weber, N. & Thomer, A. (2014). Paratexts and Documentary Practices: Text Mining Authorship and Acknowledgement from a Bioinformatics Corpus. In Examining Paratextual Theory and its Applications in Digital Culture. Nadine Desrochers and Daniel Apollon, eds. IGI Global Press. Hershey, Pennsylvania.
and
Thomer, A., Weber, N. (2014). Using Named Entity Recognition as a Classification Heuristic. Poster presented at iConference 2014. Berlin. http://hdl.handle.net/2142/47362
and
Weber, N., Thomer, A. (2014). Automating the Classification of Author Contribution Statements. Poster presented at the International Digital Curation Conference. San Francisco, CA.

email us at [email protected] and [email protected] with any questions -- some of these files might be messy.

ParatextExtractor: code using BeautifulSoup to extract a few variables from PubMed articles Ver1 - first version (adoy) that only extracts acknowledgements marked with JATS tag Ver 2 - extracts a broader range of statements and also counts number of authors per article; also includes one sample bit of text for easy testing

BeautifulSoupReplaceHyphens: simple script that replaces "-" with "_" in all xml variable names (BUT NOT attributes). This makes it possible to actually parse the files in python, because python does not recognize "-" as a valid variable character.

ParatextNERExtractor.py: pulls out names from NERed text

preprocessedText.txt: first batch of preprocessed text

ProcessedTextWithAuthorTotals.xlsx: excel spreadsheet containing extracted acknowledgment statements + total number of authors per article, as well as some figures from the paper

ListOfAcknowledgedPersons.xlsx: spreadsheet containing PMIDS, Acknoweldged Persons, and Number of Acknowledged persons SampleData: original files used for testing.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
sampleData		sampleData
AckList.csv		AckList.csv
AcksForFinalCoding.txt		AcksForFinalCoding.txt
AuthorsbyAcknowledgement.xlsx		AuthorsbyAcknowledgement.xlsx
BeautifulSoupReplaceHyphens.py		BeautifulSoupReplaceHyphens.py
FinalAcks-withtaggedNEs.txt		FinalAcks-withtaggedNEs.txt
ListOfAcknowledgedPersons.xlsx		ListOfAcknowledgedPersons.xlsx
ParatextExtractorVer2.py		ParatextExtractorVer2.py
ParatextNERExtractor.py		ParatextNERExtractor.py
ProcessedTextWithAuthorTotals.xlsx		ProcessedTextWithAuthorTotals.xlsx
README.md		README.md
finalDataWithAuthors.csv		finalDataWithAuthors.csv
preprocessedText.txt		preprocessedText.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

akthom/ParatextsAndDocumentaryPractices

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages