File sizes with thousand separators #48

LordGaav · 2020-12-03T16:22:46Z

I'm trying to parse file sizes with thousands separators, but having no luck. With humanfriendly==9.0, I get the following:

$ python -i
Python 3.8.5 (default, Oct  6 2020, 07:21:17) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import humanfriendly
>>> humanfriendly.parse_size("1,067.6 KB")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "vendor/lib/python3.8/site-packages/humanfriendly/__init__.py", line 259, in parse_size
    raise InvalidSize(format(msg, size, tokens))
humanfriendly.InvalidSize: Failed to parse size! (input '1,067.6 KB' was tokenized as [1, ',', 67.6, 'KB'])

Can humanfriendly handle this? I can't seem to find a way to tell humanfriendly to expect a thousands separator (my data is fairly uniform, the separator is always the same).

The text was updated successfully, but these errors were encountered:

LordGaav · 2020-12-03T16:34:23Z

The most straightforward fix would be to just strip out the thousand separator. tokenize doesn't seem to handle locales anyways, and expects a float-like string with a unit:

diff --git a/humanfriendly/text.py b/humanfriendly/text.py
index a257a6a..de28a41 100644
--- a/humanfriendly/text.py
+++ b/humanfriendly/text.py
@@ -422,6 +422,8 @@ def tokenize(text):
     >>> tokenize('42.5 MB')
     [42.5, 'MB']
     """
+    # Strip out thousands separators
+    text = text.replace(",", "")
     tokenized_input = []
     for token in re.split(r'(\d+(?:\.\d+)?)', text):
         token = token.strip()

riaqn · 2021-02-14T21:12:53Z

can you maybe open a PR?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

File sizes with thousand separators #48

File sizes with thousand separators #48

LordGaav commented Dec 3, 2020

LordGaav commented Dec 3, 2020

riaqn commented Feb 14, 2021

File sizes with thousand separators #48

File sizes with thousand separators #48

Comments

LordGaav commented Dec 3, 2020

LordGaav commented Dec 3, 2020

riaqn commented Feb 14, 2021