Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File sizes with thousand separators #48

Open
LordGaav opened this issue Dec 3, 2020 · 2 comments
Open

File sizes with thousand separators #48

LordGaav opened this issue Dec 3, 2020 · 2 comments

Comments

@LordGaav
Copy link

LordGaav commented Dec 3, 2020

I'm trying to parse file sizes with thousands separators, but having no luck. With humanfriendly==9.0, I get the following:

$ python -i
Python 3.8.5 (default, Oct  6 2020, 07:21:17) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import humanfriendly
>>> humanfriendly.parse_size("1,067.6 KB")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "vendor/lib/python3.8/site-packages/humanfriendly/__init__.py", line 259, in parse_size
    raise InvalidSize(format(msg, size, tokens))
humanfriendly.InvalidSize: Failed to parse size! (input '1,067.6 KB' was tokenized as [1, ',', 67.6, 'KB'])

Can humanfriendly handle this? I can't seem to find a way to tell humanfriendly to expect a thousands separator (my data is fairly uniform, the separator is always the same).

@LordGaav
Copy link
Author

LordGaav commented Dec 3, 2020

The most straightforward fix would be to just strip out the thousand separator. tokenize doesn't seem to handle locales anyways, and expects a float-like string with a unit:

diff --git a/humanfriendly/text.py b/humanfriendly/text.py
index a257a6a..de28a41 100644
--- a/humanfriendly/text.py
+++ b/humanfriendly/text.py
@@ -422,6 +422,8 @@ def tokenize(text):
     >>> tokenize('42.5 MB')
     [42.5, 'MB']
     """
+    # Strip out thousands separators
+    text = text.replace(",", "")
     tokenized_input = []
     for token in re.split(r'(\d+(?:\.\d+)?)', text):
         token = token.strip()

@riaqn
Copy link

riaqn commented Feb 14, 2021

can you maybe open a PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants