Format token count with thousands separator #74

markus1189 · 2025-02-15T19:03:48Z

I just discovered code2prompt and it's really useful.

One thing however that was a bit cumbersome was to "eye-ball" how many tokens it counted.

This is very important, because models often specify e.g. 32k, 128k, 1000k as their context length.

So in order to more quickly determine if this has a chance to fit my model's input restrictions I was frequently puzzled whether the number is 1M or 100k or 10k etc.

I did a quick try and added some formatting with thousands separators:

makes it easier to check the rough estimate of the tokens
makes it harder to parse it out via piping and shell utils

Example of before & after:

What do you think?

ODAncona · 2025-02-16T21:15:59Z

Hello, thanks for your contribution

This is indeed a useful feature.

In order to get the maximal flexibility out of it, I propose the following changes:

Make the --token flag parameterized. By default it should be the "machine parsable" format and if you specify it, it should become the "human readable" format or the other way around.

The usage would become:
code2prompt . --tokens <FORMAT>

The same way the --sort function works

The FORMAT enum should contain "human" and "machine"

What do you think about it ?

Furthermore, the default local shouldn't be Locale::en but rather SystemLocale::default()

markus1189 · 2025-02-17T07:16:58Z

Sounds good @ODAncona thank you for your thoughts. Will get to work on your suggestions

ODAncona · 2025-02-17T09:31:21Z

Great 👍 we'll forge an incredible and robust tool.

Format token count with thousands separator

ea88642

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Format token count with thousands separator #74

Format token count with thousands separator #74

markus1189 commented Feb 15, 2025

ODAncona commented Feb 16, 2025

markus1189 commented Feb 17, 2025

ODAncona commented Feb 17, 2025

Format token count with thousands separator #74

Are you sure you want to change the base?

Format token count with thousands separator #74

Conversation

markus1189 commented Feb 15, 2025

ODAncona commented Feb 16, 2025

markus1189 commented Feb 17, 2025

ODAncona commented Feb 17, 2025