You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, could you paste the actual data you're using? (Just one of the texts would help probably).
For me with the beginning of your first text, the punctuation is removed successfully:
>>>importtextheroashero>>>importpandasaspd>>>s=pd.Series(["Honestly people don't know about the fact ..."])
>>>hero.clean(s)
0honestlypeopleknowfactdtype: object
The issue is probably that some punctuation in your text is not "standard" punctuation (texthero internally uses import string; string.punctuation so if it's not in there it won't be removed
This is my code and I was trying to clean a large dataset
According to the documentation this is the default pipeline for the
clean
functionality:But my ouput does not reflect this as some of the punctuation remained in the text.
Original text column
Preprocessed text column
The text was updated successfully, but these errors were encountered: