implement get_num_tokens to use google's count_tokens function #10565

hsuyuming · 2023-09-14T01:53:22Z

can get the correct token count instead of using gpt-2 model

Description:
Implement get_num_tokens within VertexLLM to use google's count_tokens function. (https://cloud.google.com/vertex-ai/docs/generative-ai/get-token-count). So we don't need to download gpt-2 model from huggingface, also when we do the mapreduce chain we can get correct token count.

Tag maintainer:
@lkuligin
Twitter handle:
My twitter: @abehsu1992626

can get the correct token count instead of using gpt-2 model

vercel · 2023-09-14T01:53:25Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
langchain	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Oct 30, 2023 8:27pm

hsuyuming · 2023-09-14T20:20:56Z

@lkuligin @eyurtsev @hwchase17
Could you help me review this PR when you get time? Thank you.

hsuyuming · 2023-09-15T15:49:16Z

@holtskinner can you help me review this PR, please?

hsuyuming · 2023-09-15T21:06:49Z

@baskaryan Can you help me review this?

hsuyuming · 2023-10-04T16:51:31Z

@lkuligin @eyurtsev @hwchase17 @holtskinner
Can you help me review this PR? Since this one have opened for few weeks.

lkuligin · 2023-10-04T17:55:34Z

I don't think there's a need to add additional client and deal with it. You can use count_tokens method instead that is available for a TextModel itself (i.e., self.client.count_tokens for the Langchain implementation).

hsuyuming · 2023-10-06T19:56:52Z

@lkuligin Please correct me if i am wrong.

it looks like TextModel didn't have count_tokens function.

>>> llm = VertexAI()
>>> "count_tokens" in dir(llm.client)
False
>>> llm.client.couont_tokens
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: '_PreviewTextGenerationModel' object has no attribute 'couont_tokens'

hsuyuming · 2023-10-06T20:03:04Z

ohh, this function is release by google-cloud-aiplatform v1.34.0 version, which release on Oct6
https://github.com/googleapis/python-aiplatform/releases/tag/v1.34.0

hsuyuming · 2023-10-06T20:43:13Z

@lkuligin I did some modifications base on your suggestions. Please help us review it again when you are available. Thank you. :)

hsuyuming · 2023-10-10T17:13:08Z

Hi @lkuligin :
Can you help me review again? Thank you

hsuyuming · 2023-10-17T16:09:41Z

@lkuligin Sorry to bother you. Can you review this change? please

hsuyuming · 2023-10-18T22:15:01Z

@lkuligin @eyurtsev @hwchase17 @holtskinner
Can you help me review this PR? Since this one have opened for few weeks.

holtskinner · 2023-10-23T20:20:41Z

I can't approve/merge this PR, but LGTM.

lkuligin · 2023-10-24T11:37:24Z

libs/langchain/tests/integration_tests/llms/test_vertexai.py

@@ -97,3 +100,26 @@ async def test_model_garden_agenerate() -> None:
    output = await llm.agenerate(["What is the meaning of life?", "How much is 2+2"])
    assert isinstance(output, LLMResult)
    assert len(output.generations) == 2
+
+
+def test_vertex_call_trigger_count_tokens(mocker) -> None:


nits: I'm not sure this qualifies as an integration test. Probably, for a integration test we should rather actually invoke a call to VertexAI and check that it returns results successfully, wdyt?

Hi @lkuligin:
For this testcase (test_vertex_call_trigger_count_tokens), i modify it, so it will call the underlying function directory, I also create the other testcase called "test_get_num_tokens_be_called_when_using_mapreduce_chain", because the token usage didn't be included from chain output. So it is hard to me to make sure if this function(count_tokens) will be called when user are using mapreduce method or not. This is why i decide to use marker to test this logic.

Please let me know your thoughts, or any suggestions i can make it more better. :)

…feature/enhance-vertexai-llm

@lkuligin

…hain-ai#10565) can get the correct token count instead of using gpt-2 model **Description:** Implement get_num_tokens within VertexLLM to use google's count_tokens function. (https://cloud.google.com/vertex-ai/docs/generative-ai/get-token-count). So we don't need to download gpt-2 model from huggingface, also when we do the mapreduce chain we can get correct token count. **Tag maintainer:** @lkuligin **Twitter handle:** My twitter: @abehsu1992626 --------- Co-authored-by: Bagatur <[email protected]>

@lkuligin

…hain-ai#10565) can get the correct token count instead of using gpt-2 model **Description:** Implement get_num_tokens within VertexLLM to use google's count_tokens function. (https://cloud.google.com/vertex-ai/docs/generative-ai/get-token-count). So we don't need to download gpt-2 model from huggingface, also when we do the mapreduce chain we can get correct token count. **Tag maintainer:** @lkuligin **Twitter handle:** My twitter: @abehsu1992626 --------- Co-authored-by: Bagatur <[email protected]>

implement get_num_tokens to use google's count_tokens function. So we

28354b1

can get the correct token count instead of using gpt-2 model

dosubot bot added Ɑ: models Related to LLMs or chat model modules 🤖:improvement Medium size change to existing code to handle new use-cases labels Sep 14, 2023

fix conflict file

0be0da2

vercel bot deployed to Preview September 27, 2023 16:31 View deployment

add error handling

427c1b1

lkuligin reviewed Oct 24, 2023

View reviewed changes

Merge branch 'master' of https://github.com/hsuyuming/langchain into …

372faf4

…feature/enhance-vertexai-llm

vercel bot deployed to Preview October 30, 2023 01:52 View deployment

hsuyuming and others added 4 commits October 30, 2023 02:14

modify vertex testcase

4235878

change minimun version

9609a49

add google-cloud-aiplatform as optional package

ef26178

merge

6a75665

baskaryan added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Oct 30, 2023

vercel bot deployed to Preview October 30, 2023 20:27 View deployment

baskaryan merged commit 630ae24 into langchain-ai:master Oct 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement get_num_tokens to use google's count_tokens function #10565

implement get_num_tokens to use google's count_tokens function #10565

hsuyuming commented Sep 14, 2023

vercel bot commented Sep 14, 2023 •

edited

Loading

hsuyuming commented Sep 14, 2023 •

edited

Loading

hsuyuming commented Sep 15, 2023

hsuyuming commented Sep 15, 2023

hsuyuming commented Oct 4, 2023

lkuligin commented Oct 4, 2023

hsuyuming commented Oct 6, 2023

hsuyuming commented Oct 6, 2023 •

edited

Loading

hsuyuming commented Oct 6, 2023

hsuyuming commented Oct 10, 2023

hsuyuming commented Oct 17, 2023

hsuyuming commented Oct 18, 2023

holtskinner commented Oct 23, 2023

lkuligin Oct 24, 2023

hsuyuming Oct 30, 2023

implement get_num_tokens to use google's count_tokens function #10565

implement get_num_tokens to use google's count_tokens function #10565

Conversation

hsuyuming commented Sep 14, 2023

vercel bot commented Sep 14, 2023 • edited Loading

hsuyuming commented Sep 14, 2023 • edited Loading

hsuyuming commented Sep 15, 2023

hsuyuming commented Sep 15, 2023

hsuyuming commented Oct 4, 2023

lkuligin commented Oct 4, 2023

hsuyuming commented Oct 6, 2023

hsuyuming commented Oct 6, 2023 • edited Loading

hsuyuming commented Oct 6, 2023

hsuyuming commented Oct 10, 2023

hsuyuming commented Oct 17, 2023

hsuyuming commented Oct 18, 2023

holtskinner commented Oct 23, 2023

lkuligin Oct 24, 2023

Choose a reason for hiding this comment

hsuyuming Oct 30, 2023

Choose a reason for hiding this comment

vercel bot commented Sep 14, 2023 •

edited

Loading

hsuyuming commented Sep 14, 2023 •

edited

Loading

hsuyuming commented Oct 6, 2023 •

edited

Loading