Testing performance of Knora with large texts.
Requires:
-
Start GraphDB.
-
Create the
knora-test
repository usingknora-api/webapi/scripts/graphdb-se-local-init-knora-test.sh
. -
Delete the Redis cache:
rm dump.rdb
. -
Start Redis, Sipi, and Knora.
-
Run
knora-create-ontology book-onto.json
. -
Stop Knora.
-
Run
./upload-standoff-defs.sh
. -
Start Knora.
-
Run
./send-mapping.py
. -
Run
./import.py INPUT
, whereINPUT
is a directory containing plain-text versions of books downloaded from Project Gutenberg.
The text is run through the NLTK POS tagger to add
(where WORD
is the word being marked up):
<noun>WORD</noun>
(books:StandoffNounTag
) for nouns<verb>WORD</verb>
(books:StandoffVerbTag
) for verbs<adj>WORD</adj>
(books:StandoffAdjectiveTag
) for adjectives<det>WORD</det>
(books:StandoffDeterminerTag
) for determiners
Each group of ten words is wrapped in <sentence>
(books:StandoffSentenceTag
).
Each group of five <sentence>
elements is wrapped in <p>
(standoff:StandoffParagraphTag
).