Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop words removal #5

Open
geraldstanje opened this issue Aug 1, 2014 · 2 comments
Open

Stop words removal #5

geraldstanje opened this issue Aug 1, 2014 · 2 comments

Comments

@geraldstanje
Copy link

Hi,

I currently test it with golang 1.3.
does go-porterstemmer also remove stop words or can you suggest a lib?

Gerald

@reiver
Copy link
Owner

reiver commented Aug 2, 2014

Hello Gerald,

The "go-porterstemmer" algorithm does NOT remove stop words.

(It just implements the "porter stemmer" stemming algorithm (in Go).)

But you could code it yourself pretty easily.

First, you want a list of all the English stop words.

There are various lists of them on the Internet. Here are some lists:

Then just extract that list of stop words and get it into your Go code.

And before you send something to the "go-porterstemmer" check if it in your list of stop words.

@geraldstanje
Copy link
Author

Hell Charles,

thanks for your reply. Already implemented :)

does porter handle unicode? e.g. don\u2019t

does porter stemmer handle negation markers like: don’t, doesn’t, won’t, can’t,
mustn’t, isn’t, aren’t, wasn’t, weren’t, couldn’t, shouldn’t, wouldn’t ?
They should be transformed in do not, will not, cannot, must not, is not,
etc...

Thanks,
Gerald

2014-08-02 7:02 GMT+02:00 Charles [email protected]:

Hello Gerald,

The "go-porterstemmer" algorithm does NOT remove stop words.

(It just implements the "porter stemmer" stemming algorithm (in Go).)

But you could code it yourself pretty easily.

First, you want a list of all the English stop words.

There are various lists of them on the Internet. Here are some lists:

Then just extract that list of stop words and get it into your Go code.

And before you send something to the "go-porterstemmer" check if it in
your list of stop words.


Reply to this email directly or view it on GitHub
#5 (comment)
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants