Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.Net: fix: array query parameters with single entries #9771

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,12 @@ public static JsonNode Convert(string name, string type, object argument)
"string" => JsonValue.Create(argument),
"array" => argument switch
{
string s => JsonArray.Parse(s) as JsonArray,
#if NETSTANDARD2_1_OR_GREATER
string s when s.Trim().StartsWith('[') => JsonArray.Parse(s) as JsonArray,
#else
string s when s.Trim().StartsWith("[", StringComparison.OrdinalIgnoreCase) => JsonArray.Parse(s) as JsonArray,
#endif
string s => [JsonValue.Create(s)],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder which scenario requires the string -> JSONArray conversion. Is it when the LLM provides a string argument to a parameter of an array type? I'm not sure SK should perform this conversion, and I would argue that it's the caller's responsibility to provide a valid JSON array as an argument for the parameter of array type. If the LLM is returning the string, the error details should be sent back to it so it can make another call with the correct argument type - this is already implemented in SK.

My concern is that if we start doing the conversion because the LLM sometimes provides invalid JSON values, sooner or later we will find ourselves in a tricky situation when we are asked, for example, to create an array element for each word in the string - "v1 v2 v3," "v1, v2, v3," or "v1;v2;v3." How are we going to support that without introducing any behavioral change? We were creating an array with one element from the string, but now we would have an array with an element per word in the string. Will we add an extra option/execution parameter that allows us to tweak the conversion behavior? What if different string -> array conversion flavors are needed within the same operation or in different operations imported from the same document?

I think it's safer not to perform that conversion at the SK level and allow the LLM to self-correct based on function invocation errors. If an extra request to the LLM is not desirable, then consumers can easily decorate each operation with a delegate - How to transform a KernelFunction that would handle the conversion of arguments provided by the LLM based on the target type from the operation metadata and call original function with valid arguments.

Copy link
Member Author

@baywet baywet Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe this is a matter of the LLM generating "the wrong JSON" but rather how the response is being parsed, hence the fix.

For the following prompt using only the subject, recap my last 5 emails it eventually builds createdDateTime desc as the string value here for the $orderBy query parameter, which is correct.

       - name: $orderby
          in: query
          description: Order items by property values
          style: form
          explode: false
          schema:
            uniqueItems: true
            type: array
            items:
              type: string
              enum:
                - id
                - id desc
                - createdDateTime
                - createdDateTime desc

However, I do agree this could be a slippery slope, especially for request and response bodies, so maybe we limit those graceful upgrades to query/paths parameters and headers?

What's probably causing the mis-alignment here is the style from and the transformation that already gets applied post parsing. We could alternatively accept the LLMs response as is since it's correct and remove the parsing all together, but this might impact reliability in other cases?

I don't think letting the LLM brute force through adding extra square brackets is a productive use of resources here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@baywet The example you show is serializing of URL parameters, as per 6570 rules. I don't think we can mix serializing of URL parameters and JSON. They follow very different rules. With 6570 whether the parameter value is a simple string, or an array of strings it would serialize correctly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which begs the question: why does the pipeline to generate query parameters value go through a JSON parsing in the first place?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the following prompt using only the subject, recap my last 5 emails it eventually builds createdDateTime desc as the string value here for the $orderBy query parameter, which is correct.

Who builds it, LLM or SK? If it's the LLM that provides "createdDate desc" string for a parameter of an array type, then it's incorrect because the "createdDate desc" string is not a valid JSON array (see https://datatracker.ietf.org/doc/html/rfc8259#section-5).

What's probably causing the mis-alignment here is the style from and the transformation that already gets applied post parsing. We could alternatively accept the LLMs response as is since it's correct and remove the parsing all together, but this might impact reliability in other cases?

To consider the LLM response as correct, it should conform to the parameter specification that requires the parameter to be of array type; every response that deviates from this is incorrect, and accepting it only pushes the issue further in the invocation chain -> parameters serialization failing to serialize the parameter value in accordance with the specified format or the REST API service itself receiving a wrongly formatted query string if the serialization part is dropped or produces the wrong result.

Parsing is required because SK supports non-LLM scenarios as well, where parameter arguments can be provided as a JSON string or .NET type. For example, the argument for the orderby parameter can either be a JSON array - "["id desc", "createdDateTime asc"]" or an instance of a .NET List type, such as new List<string>() { "id desc" }.

I don't think letting the LLM brute force through adding extra square brackets is a productive use of resources here.

... If an extra request to the LLM is not desirable, then consumers can easily decorate each operation with a delegate - How to transform a KernelFunction that would handle the conversion of arguments provided by the LLM based on the target type from the operation metadata and call original function with valid arguments.

Which begs the question: why does the pipeline to generate query parameters value go through a JSON parsing in the first place?

To simplify the parameters serialization functionality that would otherwise be cluttered with the parsing and validation logic required to format the parameters, as well as the if/else logic to determine whether the argument is JSON or an instance of a .NET type.

BTW: There's no need to specify "$" in the "$orderby" parameter name, SK will do it for you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the additional information.

As far as I understand kernel functions transformations, they are deeply specific to a given function, which would not be suitable in our case since we'll want a solution that works for any function, generated from any OpenAPI/Api Manifest/Copilot Plugin.

Who builds it, LLM or SK?

I'm not sure, whatever ends up calling this method. As far as I understand, SK calls the LLM to get the query parameters values generated, but maybe I understand this wrong?

In your previous answer, you seem to be eluding to the fact that we might be able to "force the hand" of the LLM and tell it "no, really, even if you come up with a single result, it needs to be returned as a valid JSON array with a single entry". Or maybe this is me extrapolating. Would you have pointers for that?

_ => JsonSerializer.SerializeToNode(argument) as JsonArray
},
"integer" => argument switch
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -111,5 +111,7 @@ public void ItShouldConvertCollections()
Assert.Equal("[1,2,3]", OpenApiTypeConverter.Convert("id", "array", new Collection() { 1, 2, 3 }).ToJsonString());

Assert.Equal("[1,2,3]", OpenApiTypeConverter.Convert("id", "array", "[1, 2, 3]").ToJsonString());

Assert.Equal("[\"createdDate desc\"]", OpenApiTypeConverter.Convert("id", "array", "createdDate desc").ToJsonString());
}
}
Loading