Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Format token count with thousands separator #74

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

markus1189
Copy link

I just discovered code2prompt and it's really useful.

One thing however that was a bit cumbersome was to "eye-ball" how many tokens it counted.

This is very important, because models often specify e.g. 32k, 128k, 1000k as their context length.

So in order to more quickly determine if this has a chance to fit my model's input restrictions I was frequently puzzled whether the number is 1M or 100k or 10k etc.

I did a quick try and added some formatting with thousands separators:

  • makes it easier to check the rough estimate of the tokens
  • makes it harder to parse it out via piping and shell utils

Example of before & after:

image

What do you think?

@ODAncona
Copy link
Collaborator

Hello, thanks for your contribution

This is indeed a useful feature.

In order to get the maximal flexibility out of it, I propose the following changes:

Make the --token flag parameterized. By default it should be the "machine parsable" format and if you specify it, it should become the "human readable" format or the other way around.

The usage would become:
code2prompt . --tokens <FORMAT>

The same way the --sort function works

The FORMAT enum should contain "human" and "machine"

What do you think about it ?

Furthermore, the default local shouldn't be Locale::en but rather SystemLocale::default()

@markus1189
Copy link
Author

Sounds good @ODAncona thank you for your thoughts. Will get to work on your suggestions

@ODAncona
Copy link
Collaborator

Great 👍 we'll forge an incredible and robust tool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants