Improvements and bugfixes #142

csavelief · 2018-09-26T12:00:19Z

Improvements on StringIterator :

Add better tests (easier to read) ;
Classify char code 160 as whitespace ;
Split test cleaning and sentence parsing. It is now working quite well to format texts extracted from PDF files with Apache Tika.

Closes Issue 104 : TokenInstanceIterator does not iterate on more than one Instance.

Closes Issue 126 : printDocumentTopics() throws an IndexOutOfBoundsException if the number of topics is not the same as the number of documents.

Be aware that due to compilation issues on Windows 10 I had to remove the symlink from lib/errorprone.jar in the build.xml file (commit ec265c3). Tell me if I need to rollback it for the pull-request.

csavelief · 2019-05-11T12:00:25Z

Update (2019-05-11) :

Rebase MNCC/Mallet/master to mimno/Mallet/master
Fix compilation issue (remove Google Guava)

- Add better tests (easier to read) ; - Classify char code 160 as whitespace ; - Split test cleaning and sentence parsing. It is now working quite well to format texts extracted from PDF files with Apache Tika. - Closes Issue 104 : TokenInstanceIterator does not iterate on more than one Instance. - Closes Issue 126 : printDocumentTopics() throws an IndexOutOfBoundsException if the number of topics is not the same as the number of documents.

csavelief changed the title ~~Improvements on StringIterator~~ Improvements and bugfixes Sep 26, 2018

csavelief added 9 commits June 27, 2019 13:20

Minor heuristic to detect bullet points

97069e4

Squash commits

35b6040

Merge remote-tracking branch 'origin/master'

7c0a048

Minor heuristic to detect bullet points

3608d54

Squash commits

34fead1

Merge remote-tracking branch 'origin/master'

727e241

Merge branch 'master' of https://github.com/MNCC/Mallet into HEAD

ad5495c

csavelief closed this Jun 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvements and bugfixes #142

Improvements and bugfixes #142

csavelief commented Sep 26, 2018 •

edited

Loading

csavelief commented May 11, 2019

Improvements and bugfixes #142

Improvements and bugfixes #142

Conversation

csavelief commented Sep 26, 2018 • edited Loading

csavelief commented May 11, 2019

csavelief commented Sep 26, 2018 •

edited

Loading