diff --git a/Text_classification_of_newspaper_clippings_notebook.ipynb b/Text_classification_of_newspaper_clippings_notebook.ipynb index cbc9e55..0d82dcc 100644 --- a/Text_classification_of_newspaper_clippings_notebook.ipynb +++ b/Text_classification_of_newspaper_clippings_notebook.ipynb @@ -17,7 +17,7 @@ "\n", "For classification, topic modelling (LDA) was chosen because it showed the best performance in classification (after experiments with word embeddings or LDA and word embeddings combined). LDA provides a way to group documents by topic and perform similarity searches and improve precision. Thanks to sklearn, it is relatively easy to test different classifiers for a given topic classification task. Logistic regression was chosen as binary classifier. \n", "\n", - "*Following graph demonstrates the distribution of the pre-defined categories in newspaper clippings of seleceted Austrian Newspapers (~1000 clippings) on the topic of emigration.* \n", + "*Following graph demonstrates the distribution of the pre-defined categories in newspaper clippings of seleceted Austrian Newspapers (790 clippings) on the topic of emigration.* \n", "\n", "![Collection on the topic of Emigration](images/categories.PNG)\n", "\n",