knora-large-texts

Testing performance of Knora with large texts.

Requires:

Creating the Repository

Start GraphDB.
Create the knora-test repository using knora-api/webapi/scripts/graphdb-se-local-init-knora-test.sh.
Delete the Redis cache: rm dump.rdb.
Start Redis, Sipi, and Knora.
Run knora-create-ontology book-onto.json.
Stop Knora.
Run ./upload-standoff-defs.sh.
Start Knora.
Run ./send-mapping.py.
Run ./import.py INPUT, where INPUT is a directory containing plain-text versions of books downloaded from Project Gutenberg.

The text is run through the NLTK POS tagger to add (where WORD is the word being marked up):

Each group of ten words is wrapped in <sentence> (books:StandoffSentenceTag).

Each group of five <sentence> elements is wrapped in <p> (standoff:StandoffParagraphTag).

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
project-gutenberg-books		project-gutenberg-books
xml		xml
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
book-onto.json		book-onto.json
httprequestdump.js		httprequestdump.js
import-fragments.py		import-fragments.py
import.py		import.py
linguistic-mapping.xml		linguistic-mapping.xml
linguistic-standoff.trig		linguistic-standoff.trig
send-mapping.py		send-mapping.py
upload-standoff-defs.sh		upload-standoff-defs.sh