diff --git a/README.md b/README.md index dfedac4..bf90261 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ # German Wikipedia Text Corpus -This is a german text corpus from Wikipedia. It is useful to train NLP embeddings for example. +This is a german text corpus from Wikipedia. It is cleaned and preprocessed and useful to train NLP embeddings for example. As Wikipedia itself this is published under [Creative Commons Attribution-ShareAlike 3.0 Unported license](https://de.wikipedia.org/wiki/Wikipedia:Lizenzbestimmungen_Creative_Commons_Attribution-ShareAlike_3.0_Unported).