Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add prompt snippets support #234220

Open
wants to merge 52 commits into
base: main
Choose a base branch
from

Conversation

legomushroom
Copy link
Member

@legomushroom legomushroom self-assigned this Nov 19, 2024
@vs-code-engineering vs-code-engineering bot added this to the November 2024 milestone Nov 19, 2024
@legomushroom legomushroom force-pushed the legomushroom/prompt-snippet-completions branch from 6f5026f to a927e4e Compare November 19, 2024 21:45

prompt.parts
.forEach((part, i) => {
public async resolveVariables(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly readability improvements in this file, the logic remains the same.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, but I'd prefer to keep PRs focused more narrowly, and the word "mostly" makes it a little harder to review :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sorry, it was very hard to read even on 27" monitor, so I was formatting it on the go as I was parsing through the code, and then decided to keep it. The same applies for other cases where I split long lines over. Let me know if you think that reverting this would help with the review at this point 👂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also recommend scoping PRs down where possible, large PRs with many changes make it hard on the reviewers and ultimately delay landing features.

// to go first so that an replacement logic is simple
resolvedVariables
.sort((left, right) => {
assertDefined(
Copy link
Member Author

@legomushroom legomushroom Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These asserts are to omit TS hacks ☺️

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with the ! because we know that these are parts of a prompt and have ranges, but the assert is fine too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they always have ranges, we need to update the TS types then?
The asserts just add a more readable error messages here as opposed to a cryptic cannot read 'start' of 'undefined' you'd get with the bang ☺️

) {
super();
this._register(widget.inputEditor.onDidChangeModelContent(e => {
e.changes.forEach(c => {
// Don't mutate entries in _variables, since they will be returned from the getter
this._variables = coalesce(this._variables.map(ref => {
this._variables = coalesce(this._variables.map((ref) => {
if (c.text === `#file:${ref.filenameWithReferences}`) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the only way I made it work and its not ideal, any better ideas? cc @roblourens @joyceerhl

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this trying to do? Is this triggered when that text is inserted all at once?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we add the (+N more) label to a filename after nested references are resolved, this is run and as the result the variables are removed. The logic below infers that the variables text have changed and are hence the variables are no longer valid and need to be removed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we update the variable objects when we update the labels, this if check works, - if the variable text is really the same as the variable should have, do nothing.

endLineNumber: ref.range.endLineNumber,
endColumn: ref.range.endColumn + delta
}
ref.range = {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roblourens do you recall the main reason for the // Don't mutate entries in _variables.. above? Because the ref is now a class instance with event listeners now, I do need to mutate the variables here 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because those objects will have been previously returned, and other objects will be holding on to them, and not expecting them to change. Maybe there is somewhere that we diff the previous object with the new one, and mutating the object breaks that. There are other ways we can solve that. But I don't quite understand the problem for you

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe there is somewhere that we diff the previous object with the new one

But that would break only if the diff logic relies on the object pointers, doing diffing by-attribute should be unaffected in either case?

There is no really a problem for me, just wanted to make sure that mutating the objects is fine here. We do need to mutate them now because the ref variable is a class instance now rather than a simple object it was before. The class instance can have event listeners registered on it by other code, and it handles nested file references resolution which we don't want to again over and over 🤷

@legomushroom legomushroom force-pushed the legomushroom/prompt-snippet-completions branch from a927e4e to 2fa00b9 Compare November 19, 2024 22:18
Copy link
Member

@roblourens roblourens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not done reviewing but wanted to drop the first round of comments. Overall, when I'm working on a large feature, I usually try to start pushing small PRs to main as early as possible. That's the best way to avoid merge conflicts and get feedback early. Don't be afraid of pushing an incomplete feature behind a hidden setting, as long as it won't break anything. We do that often on this team. I also would rather avoid PRs that make multiple unrelated changes. It just gets a bit hard to review.

* assertDefined(null, new Error('Should throw this error.'))
* ```
*/
export function assertDefined<T>(value: T, error: string | NonNullable<Error>): asserts value is NonNullable<T> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this in types.ts, assertIsDefined

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is a little bit different since it is an assert and doesn't require variable reassignment. Let me see if I can use the existing one without need to create let variable just for the assertion 🤔

@@ -186,7 +186,7 @@ export interface ITransformer<Original, Transformed> {
error?: IErrorTransformer;
}

export function newWriteableStream<T>(reducer: IReducer<T>, options?: WriteableStreamOptions): WriteableStream<T> {
export function newWriteableStream<T>(reducer: IReducer<T> | null, options?: WriteableStreamOptions): WriteableStream<T> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional argument? We generally use undefined over null

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used null on purpose to make it explicit to prevent subtle silent errors. It is too easy to overlook a variable or a function return value being an undefined and pass it over, hence I wanted to make sure the folks that using this to be very specific.

I do appreciate the work that TS has done with void and not treating the foo?: string and foo: undefined | string the same anymore, but there is still the cases when requiring the explicit null saves some hair on your head 🤗 Do you think it would be fine with undefined? You have more context so happy to revert if you think it won't be a pain in the future.

state: WorkingSetEntryState.Attached,
kind: 'reference',
});

seenEntries.add(child);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a correct fix (cc @joyceerhl) but is this method even related to your new feature? I prefer not grouping unrelated fixes into a PR that's already really big

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one relates to my feature only - the child references is the new logic. The existing logic is now on the line 1086 above now, and that one would still need the fix to be applied.


prompt.parts
.forEach((part, i) => {
public async resolveVariables(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, but I'd prefer to keep PRs focused more narrowly, and the word "mostly" makes it a little harder to review :)

@@ -193,7 +195,18 @@ export class ChatWidget extends Disposable implements IChatWidget {
return { text: '', parts: [] };
}

this.parsedChatRequest = this.instantiationService.createInstance(ChatRequestParser).parseChatRequest(this.viewModel!.sessionId, this.getInput(), this.location, { selectedAgent: this._lastSelectedAgent });
assertDefined(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems unnecessary, this is right after a !this.viewModel check

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point 💥 I've tried to omit the ! in the this.viewModel!.sessionId but seems like that was also unnecessary here anyway ☺️

@@ -1046,6 +1065,29 @@ export class ChatWidget extends Disposable implements IChatWidget {
this.telemetryService.publicLog2<ChatEditingWorkingSetEvent, ChatEditingWorkingSetClassification>('chatEditing/workingSetSize', { originalSize: this.inputPart.attemptedWorkingSetEntriesCount, actualSize: uniqueWorkingSetEntries.size });
}

// factor in nested references of dynamic variables into the implicit attached context
const variableModel = this.getContrib<ChatDynamicVariableModel>(ChatDynamicVariableModel.ID);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the ChatWidget calls to a contrib directly, then it's no longer a "contrib". The point of that model is to have something that can implement a feature on top of an object using its public interface, but is totally decoupled from the object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So either this should no longer be a 'contrib' or something else needs to change, I think I need to understand the overall thing better to decide.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed to get to the variables list to extend the attached context with the resolved nested file references below.
Maybe there is a better place of doing this, or a better place to add the nested file references to? 🤔

Copy link
Member Author

@legomushroom legomushroom Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to ChatInputPart.getAttachedAndImplicitContext() since we add the nested file references as an implicit context atm 🚀 Let me know if that works better.

/**
* A file reference token inside a prompt.
*/
export class FileReference extends BaseToken {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure workbench/common is the right place for these files, it all seems specific to the chat feature, so should it stay under workbench/contrib/chat/...?

* );
* ```
*/
export const randomInt = (max: number, min: number = 0): number => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a good one for src/vs/base/common/numbers.ts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pointer! I definitely need tips on the files structure 🤗

}

if (typeof value === 'string') {
return value.trim().toLowerCase() === 'true';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this also expecting a string vs boolean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The returned value is unknown and we provide the JSON view of the config, so users may add the "true"(or even "True") value there and get confused? Or we already parse it to a correct boolean value under the hood?

@@ -51,7 +51,9 @@ class InputEditorDecorations extends Disposable {

this.updateInputEditorDecorations();
this._register(this.widget.inputEditor.onDidChangeModelContent(() => this.updateInputEditorDecorations()));
this._register(this.widget.onDidChangeParsedInput(() => this.updateInputEditorDecorations()));
this._register(this.widget.onDidChangeParsedInput(() => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is necessarily an improvement, we use the single-line version a lot

Copy link
Member Author

@legomushroom legomushroom Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a readability improvement in the sense it makes it more explicit, hence take less brain cycles to parse the syntax.

But the main reason for the change is not readability - it takes less keystrokes to change this code in the future, and you can set a breakpoint immediately during debugging (I think this change is actually a leftover of me debugging that callback 🤷).

Do you know what is the main reason for using this pattern in the codebase? Is it mostly less typing or some motives involved?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I use the original style because it's more compact.

}

// get all file references in the file contents
const references = await this.codec.decode(fileStream.value).consumeAll();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering whether this needs to be a stream? Do you use it as a stream?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's just nice to have the stream since you are working from a file stream, just wondering if there is a case where you would use it that way

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the amount of data we need to parse might be substantial and because filesystem IO is involved, I thought it is important to keep it as a stream so we can do the parsing concurrently.

I have a follow-up task to make the logic here take leverage of that. Beside this line, the decoders are composed(e.g., PromptFileDecoder uses SimpleTokenDecoder) hence fully take advantage of the fact that an underlying decoder is a stream of messages 🚀

@roblourens
Copy link
Member

I can't get this to work- I set the setting, should I see intellisense for #file? If I manually type #file:./test.js, it doesn't include that file.

Also, it seems like this might break normal implicit context?

…ence-recursion-proof and add approptiate tests
…eference object only if `prompt-snippets` config value is set
…ile reference object only if `prompt-snippets` config value is set
@legomushroom legomushroom force-pushed the legomushroom/prompt-snippet-completions branch from 356d78b to dc90f20 Compare November 22, 2024 01:27
@legomushroom
Copy link
Member Author

@roblourens

I can't get this to work- I set the setting, should I see intellisense for #file? If I manually type #file:./test.js, it doesn't include that file.

You mean autocompletion? Yeah it should work just fine, I don't recall it to broken at any point 🤔

Also, it seems like this might break normal implicit context?

It was broken for a bit a while back but should work just fine now, please try again!

dom.$(
'span.chat-implicit-hint',
undefined,
`Current file${this.getReferencesSuffix()}`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be localized

this.widget.refreshParsedInput();
}));
// make sure the variable is updated on filesystem changes
variable.addFilesystemListeners();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit surprised by this, usually the object manages its listeners internally to keep the public interface minimal.

*/
private updateVariableTexts(): void {
for (const variable of this._variables) {
const text = `#file:${variable.filenameWithReferences}`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I liked the idea of having a distinct file extension for these files, which would also enable implementing attaching prompt snippets as a separate variable type instead of overloading the existing #file: variable and implicit context. I didn't see discussion of this route in the issue you posted but let me know if that was already evaluated as an option and dismissed. For context, we have #kb: for GitHub knowledge bases when using @github. I think that would result in a simpler and more contained implementation of this functionality, and also as a user I personally wouldn't expect to pick a prompt snippet through #file.

@@ -51,7 +51,9 @@ class InputEditorDecorations extends Disposable {

this.updateInputEditorDecorations();
this._register(this.widget.inputEditor.onDidChangeModelContent(() => this.updateInputEditorDecorations()));
this._register(this.widget.onDidChangeParsedInput(() => this.updateInputEditorDecorations()));
this._register(this.widget.onDidChangeParsedInput(() => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I use the original style because it's more compact.


prompt.parts
.forEach((part, i) => {
public async resolveVariables(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also recommend scoping PRs down where possible, large PRs with many changes make it hard on the reviewers and ultimately delay landing features.

* Decoder for the common chatbot prompt message syntax.
* For instance, the file references `#file:./path/file.md` are handled by this decoder.
*/
export class ChatbotPromptDecoder extends BaseDecoder<TChatbotPromptToken, TSimpleToken> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nit, but we don't use chatbot anywhere else in this codebase. For consistency with all our other chat features, I would prefer that we renamed chatbot to just chat in the contents of this file as well as in the directory and file names.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants