Enable conditional sending of schema to OpenAI in prompt preamble #586

asimpson · 2023-06-09T01:06:16Z

Currently the OpenAI integration (#577) does not send any details about the schema or data of the user in the prompt to OpenAI. We should explore sending the schema or at least database and table names along with the prompt which should result in more relevant KQL queries. This should be conditional and opt-out by default. The UX is unsolved here but maybe a checkbox in the header to include schema details works well enough?

Potential issues

Cost
The OpenAI API charges per 1k tokens sent (and more if the user uses gpt4). Including the schema or even just the table names in the prompt potentially introduces many more tokens than the user is anticipating. We need to avoid surprise charges from API use. We should at the very least warn the user about this possibility or at best estimate how many tokens will be sent before the request is actually made.
Token limits
There is a max of 4096 tokens for the API. Along with ☝️ we should estimate the amount of tokens before send and alert the user if they've hit the limit before sending.

asimpson · 2023-06-14T19:45:17Z

OpenAI recommends gpt-3-encoder to count tokens like so:

const {encode} = require('gpt-3-encoder')

const string = process.argv[2];
console.log(string);
const encoded = encode(string)
console.log('# of tokens: ', encoded.length)

Compare results to using their tiktoken python module

import tiktoken
enc = tiktoken.get_encoding("cl100k_base")
e = enc.encode("hi there bob")
print(len(e))

asimpson added enhancement New feature or request datasource/ADX labels Jun 9, 2023

asimpson mentioned this issue Jun 9, 2023

Further OpenAI enhancements #584

Open

asimpson added this to Partner Datasources Jun 9, 2023

github-project-automation bot moved this to Incoming in Partner Datasources Jun 9, 2023

asimpson moved this from Incoming to Backlog in Partner Datasources Jun 9, 2023

alyssabull moved this from Backlog to Feature Requests in Partner Datasources Mar 12, 2024

aangelisc added the priority/low label Jul 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable conditional sending of schema to OpenAI in prompt preamble #586

Enable conditional sending of schema to OpenAI in prompt preamble #586

asimpson commented Jun 9, 2023

asimpson commented Jun 14, 2023

Enable conditional sending of schema to OpenAI in prompt preamble #586

Enable conditional sending of schema to OpenAI in prompt preamble #586

Comments

asimpson commented Jun 9, 2023

Potential issues

asimpson commented Jun 14, 2023