Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tokens count in TS SDK #16

Closed
alexeygoncharov opened this issue Apr 16, 2023 · 14 comments
Closed

Add tokens count in TS SDK #16

alexeygoncharov opened this issue Apr 16, 2023 · 14 comments

Comments

@alexeygoncharov
Copy link

Do we have this counting system in here? https://colab.research.google.com/drive/1cY_YPaV0KUdveAj_nwy7V8gVEDZh_GDz#scrollTo=XBm0Dz-P-8Fm

@chitalian
Copy link

+1

@kakhulu31
Copy link

+1

@chitalian
Copy link

I dug a bit into this.

There seem to be a few approaches.

  1. Mirror the python implementation: https://github.com/anthropics/anthropic-sdk-python/blob/main/anthropic/tokenizer.py#L51
    However this will require us to get tokenizer to compile to WASM
    JS / WebAssembly binding planned ? huggingface/tokenizers#63
    Support wasm huggingface/tokenizers#935

  2. We can mirror what is happening here: https://github.com/niieani/gpt-tokenizer/blob/main/src/main.ts#L129-L144
    But I think this way is a bit more sketch.

Any other ideas for supporting this that is a bit easier or lower hanging?

@eek
Copy link

eek commented May 24, 2023

Maybe we can use https://github.com/dqbd/tiktoken since it's a BPE tokenizer as well and works great.

@niieani
Copy link

niieani commented May 26, 2023

Author of gpt-tokenizer here. Just FYI I've rewritten gpt-tokenizer to be a complete TypeScript port of OpenAI's tiktoken (with some extra features sprinkled on top). Would be happy to accept a PR to include support for anthropics models.

@transitive-bullshit
Copy link

@bobber205
Copy link

This is extremely needed and important to determine cost/usage

@rattrayalex
Copy link
Collaborator

Good news – we expect have a separate package with this functionality released within a few days!

@bobber205
Copy link

@rattrayalex That's awesome! I know it's a big ask but do you mind commenting on this thread when it's available?

@rattrayalex
Copy link
Collaborator

rattrayalex commented Jun 28, 2023 via email

@rattrayalex
Copy link
Collaborator

This is now released! https://www.npmjs.com/package/@anthropic-ai/tokenizer

@bobber205
Copy link

@rattrayalex Does this help with completion token counts? That's the much more expensive one.

@rattrayalex
Copy link
Collaborator

Yes, you can use this to tokenize the responses as well.

@bobber205
Copy link

D'oh! Of course! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants