Skip to content

dhlab-basel/knora-large-texts

Repository files navigation

knora-large-texts

Testing performance of Knora with large texts.

Requires:

Creating the Repository

  1. Start GraphDB.

  2. Create the knora-test repository using knora-api/webapi/scripts/graphdb-se-local-init-knora-test.sh.

  3. Delete the Redis cache: rm dump.rdb.

  4. Start Redis, Sipi, and Knora.

  5. Run knora-create-ontology book-onto.json.

  6. Stop Knora.

  7. Run ./upload-standoff-defs.sh.

  8. Start Knora.

  9. Run ./send-mapping.py.

  10. Run ./import.py INPUT, where INPUT is a directory containing plain-text versions of books downloaded from Project Gutenberg.

Generated Markup

The text is run through the NLTK POS tagger to add (where WORD is the word being marked up):

  • <noun>WORD</noun> (books:StandoffNounTag) for nouns
  • <verb>WORD</verb> (books:StandoffVerbTag) for verbs
  • <adj>WORD</adj> (books:StandoffAdjectiveTag) for adjectives
  • <det>WORD</det> (books:StandoffDeterminerTag) for determiners

Each group of ten words is wrapped in <sentence> (books:StandoffSentenceTag).

Each group of five <sentence> elements is wrapped in <p> (standoff:StandoffParagraphTag).

About

Testing performance of Knora with large texts

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published