Implement hero.describe(s) #166
Labels
enhancement
New feature or request
good first issue
Good for newcomers
help wanted
Extra attention is needed
In pandas, there is a
s.describe()
function that provides some information / statistics about a series or dataframe. Example:We can see that it's not all that useful for text data.
TODO
Implement a function
hero.describe(s)
wheres
is a TextSeries (so a Series where every cell is a string, likes
in the example above. Output might includelen(s)
)len(s.unique()
)(~hero.has_content(s)).values.sum()
)s.pipe(hero.tokenize).pipe(hero.top_words)[:10]
for the 10 most common words)s.pipe(hero.tokenize).map(lambda x: len(x)).mean()
s.pipe(hero.tokenize).map(lambda x: len(x)).min()
s.pipe(hero.tokenize).map(lambda x: len(x)).max()
and should be a nice-looking Series like above.
We believe this is a great way for our users to get first insights into their text datasets.
The text was updated successfully, but these errors were encountered: