Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a non-token approach for OpenAI #16

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

martinezpl
Copy link

@martinezpl martinezpl commented May 6, 2023

Hello, as per interest shown in #2 I'd like to propose a variation for OpenAI models. This makes it possible to use OpenAI models with jsonformer without changing existing code.

Summary

  • Added possibility of filling the JSONs through calls to a non-chat OpenAI model (seems to work best with curie)
  • Two new classes: OpenAIModel and JsonformerNoTokens, which is basically the stripped down original with no tokenizer
  • Instead of using logits in generate_boolean and generate_array, we're getting the next most likely token from logprobs.
  • As a substitute for stopping criteria, stop word ", is used to limit the model's response to a single value

Why no chat?

I found the chat model to be more querulous (As an AI model I cannot blablabla...), prompt-dependant and slow.
Solution proposed here seems to be working best with text-curie-001 as it's super fast and cheap.

Perhaps somebody can figure out an effective way to utilize the chat model, but I cannot see any other way than prompting to generate the whole JSON at once, which is completely in opposition to the concept of this project.

Why no tokens?

I spent some time trying to continue operating on tokens while using the API, but I encountered two issues:

  • tokenization available on tiktoken does not seem to provide word boundaries, therefore, for example, encoding "colors" will give us two separate tokens, which after decoding give "col ors". Can't work like that.
  • chat models do not accept tokens as input

And of course, because the models run remotely we have no access to the generation process. From my perspective, this is the reason that renders all token operations pointless here. I still left the tokenizer class just in case.

How to run it?

Make sure you have OPENAI_API_KEY env var.

poetry install
poetry run python tests/test_openai.py

You'll see the JSON being filled. You can change the used model and its temperature in that file during JsonformerNoTokens initialisation.

@zhaochenyang20
Copy link

I am just pondering that, as far as I know, ChatModel is an inborn good JSON generator. GPT-3.5-turbo is derived from code-davinci-003 (Codex), which is fine-tuned on a large number of codes and really capable of generating JSON.

@zhaochenyang20
Copy link

I do have a deep interset in generating JSON format from OpenAI models. Please feel free to contact me!

@martinezpl
Copy link
Author

@zhaochenyang20 I agree it is fairly good at it, but the authors of this project don't seem to be convinced it's enough to have a good prompt for the chat. I assume it's based on their experiences, personally I don't know.

@Void-n-Null
Copy link

Very Excited to try this out!!!

@moro-n0-kimi
Copy link

moro-n0-kimi commented May 19, 2023

Looking forward to using this to parse plain text outputs from CoCa into JSON for image JSON captioning.

Update: Getting really good results so far using text-davinci-003

@martinezpl
Copy link
Author

Awesome to hear that @moro-no-kimi 🥳

@tv-ankur
Copy link

@moro-no-kimi

Are you using jsonformer with the open ai model? If yes, is it possible to share the code?

@martinezpl
Copy link
Author

This is kind of obsolete now with function calls feature from OpenAI

@wassname
Copy link

This is great work, but would complicate the repo, which is nice and simple

On this list there are quite a few others that support api only models https://github.com/wassname/awesome-interpretability/tree/main?tab=readme-ov-file#structured-output

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants