Implement hero.describe(s) #166

mk2510 · 2020-08-26T10:12:53Z

In pandas, there is a s.describe() function that provides some information / statistics about a series or dataframe. Example:

>>> s = pd.read_csv("https://raw.githubusercontent.com/jbesomi/texthero/master/dataset/bbcsport.csv")["text"]
>>> s.describe()
count                                                   737
unique                                                  727
top       India's top six secure - Ganguly\n\nCaptain So...
freq                                                      2
Name: text, dtype: object

We can see that it's not all that useful for text data.

TODO

Implement a function hero.describe(s) where s is a TextSeries (so a Series where every cell is a string, like s in the example above. Output might include

number of documents (= count from above = len(s))
number of unique documents (= unique from above = len(s.unique())
number of empty / missing (NaN) documents (= (~hero.has_content(s)).values.sum() )
most common single words (= s.pipe(hero.tokenize).pipe(hero.top_words)[:10] for the 10 most common words)
average length of documents (= s.pipe(hero.tokenize).map(lambda x: len(x)).mean()
shortest document (= s.pipe(hero.tokenize).map(lambda x: len(x)).min()
longest document (= s.pipe(hero.tokenize).map(lambda x: len(x)).max()

and should be a nice-looking Series like above.

We believe this is a great way for our users to get first insights into their text datasets.

The text was updated successfully, but these errors were encountered:

henrifroese · 2020-08-26T10:18:35Z

Will start work on this now 🍺

mk2510 · 2020-08-26T10:19:12Z

Great 🍻

k0pernicus · 2020-12-04T11:44:01Z

Solved on #168 if I don't make any mistake - we can close this issue I think :)

jbesomi · 2020-12-04T11:51:09Z

Thanks, Antoin! Keep up the good work!! 👍

mk2510 added enhancement New feature or request help wanted Extra attention is needed good first issue Good for newcomers labels Aug 26, 2020

mk2510 changed the title ~~Implement hero.info(s: TextSeries)~~ Implement hero.info(s) Aug 26, 2020

mk2510 changed the title ~~Implement hero.info(s)~~ Implement hero.describe(s) Aug 26, 2020

mk2510 assigned henrifroese Aug 26, 2020

henrifroese mentioned this issue Aug 26, 2020

Implement hero.describe(s), Closes #166 #168

Open

jbesomi closed this as completed Dec 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement hero.describe(s) #166

Implement hero.describe(s) #166

mk2510 commented Aug 26, 2020 •

edited

Loading

henrifroese commented Aug 26, 2020 •

edited

Loading

mk2510 commented Aug 26, 2020

k0pernicus commented Dec 4, 2020

jbesomi commented Dec 4, 2020

Implement hero.describe(s) #166

Implement hero.describe(s) #166

Comments

mk2510 commented Aug 26, 2020 • edited Loading

TODO

henrifroese commented Aug 26, 2020 • edited Loading

mk2510 commented Aug 26, 2020

k0pernicus commented Dec 4, 2020

jbesomi commented Dec 4, 2020

mk2510 commented Aug 26, 2020 •

edited

Loading

henrifroese commented Aug 26, 2020 •

edited

Loading