diff --git a/README.md b/README.md index bf90261..bc12649 100644 --- a/README.md +++ b/README.md @@ -2,3 +2,5 @@ This is a german text corpus from Wikipedia. It is cleaned and preprocessed and useful to train NLP embeddings for example. As Wikipedia itself this is published under [Creative Commons Attribution-ShareAlike 3.0 Unported license](https://de.wikipedia.org/wiki/Wikipedia:Lizenzbestimmungen_Creative_Commons_Attribution-ShareAlike_3.0_Unported). + +You can download the texts here: https://github.com/t-systems-on-site-services-gmbh/german-wikipedia-text-corpus/releases/tag/files_1