-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
job chat: Add a prompt testing process #108
Comments
For the record I would be happy with a manual test process which goes something like this:
We may also need to factor in drift from the LLM end itself - as eg Anthropic updates its model, I don't know how tightly we can version lock, so we may see a natural variance. |
I'm not really keen on using notebooks for this, I want a more formal process with better script support in the repo. Here's a bit more detail on the experience I'm looking for:
The idea here is that I can edit the prompt locally and re-run my questions. I can then commit changes, so that in a PR or git compare I can see the prompt changes and the resulting output. All the questions and answers must be checked-in so that we can compare to last time. |
The thinking with this is that the assistant is too immature to justify an expensive prompt versioning (or testing) process. Better to just tweak the prompt ad-hoc for now and watch for changes live. |
New prompts should be tested to evaluate their performance and minimise unexpected issues in production. This will likely involve accumulating generated test datasets targeting different issues, as well as using LLM-based evaluation to check if each test passed (T/F) to produce a score.
The text was updated successfully, but these errors were encountered: