Skip to content

Latest commit

 

History

History
46 lines (27 loc) · 1.29 KB

README.md

File metadata and controls

46 lines (27 loc) · 1.29 KB

knora-large-texts

Testing performance of Knora with large texts.

Requires:

Creating the Repository

  1. Start GraphDB.

  2. Create the knora-test repository using knora-api/webapi/scripts/graphdb-se-local-init-knora-test.sh.

  3. Delete the Redis cache: rm dump.rdb.

  4. Start Redis, Sipi, and Knora.

  5. Run knora-create-ontology book-onto.json.

  6. Stop Knora.

  7. Run ./upload-standoff-defs.sh.

  8. Start Knora.

  9. Run ./send-mapping.py.

  10. Run ./import.py INPUT, where INPUT is a directory containing plain-text versions of books downloaded from Project Gutenberg.

Generated Markup

The text is run through the NLTK POS tagger to add (where WORD is the word being marked up):

  • <noun>WORD</noun> (books:StandoffNounTag) for nouns
  • <verb>WORD</verb> (books:StandoffVerbTag) for verbs
  • <adj>WORD</adj> (books:StandoffAdjectiveTag) for adjectives
  • <det>WORD</det> (books:StandoffDeterminerTag) for determiners

Each group of ten words is wrapped in <sentence> (books:StandoffSentenceTag).

Each group of five <sentence> elements is wrapped in <p> (standoff:StandoffParagraphTag).