Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change the way we report the location of errors #287

Open
iantei opened this issue Feb 17, 2025 · 6 comments
Open

change the way we report the location of errors #287

iantei opened this issue Feb 17, 2025 · 6 comments

Comments

@iantei
Copy link
Collaborator

iantei commented Feb 17, 2025

I have created this new issue to discuss further on this change requirement:

#88 (comment)

Currently, when we report the location of errors, we pass the "parent_key" for reference. It's not quite descriptive enough to identify the error; just on the basis of "key" and "parent_key".

  • What do we want?
    We want to pass JSONPath to the key instead of just a parent key
    This would change the error format to - expected value at foo.bar[3].baz to be bool, instead found '3.14159'
@iantei
Copy link
Collaborator Author

iantei commented Feb 17, 2025

We can use serde_json_path: Struct LocatedNodeList, which is produced by JsonPath::query_located method, to extract the path to the node. And more specifically, we can use exactly_one() to return a single node representing the key

Sample code to show this in our use-case:

use serde_json::json;
use serde_json_path::JsonPath;
use anyhow::Result;
use serde_json::Value;

fn main() -> Result<()> {
    let value: Value = json!({
        "plugin": {
            "input_plugins": [
                {
                    "type": "inject",
                    "format": "toml",
                    "key": "grid_search",
                    "value": {
                        "a": 123456,
                        "b": 126897,
                        "c": {
                            "something": [1, 2, 3]
                        }
                    }
                }
            ]
        }
    });

    let path = JsonPath::parse("$..key");
    let node = path?.query_located(&value).exactly_one()?;
    let node_location = node.location().to_string();
    assert_eq!("$['plugin']['input_plugins'][0]['key']", node_location);
    println!("Locations: {}", node_location);
    Ok(())
}

Output:

Locations: $['plugin']['input_plugins'][0]['key']

Question 1: Is this format of path to the key acceptable? Or should we modify it to something like plugin.input_plugins[0].key format?

Furthermore,
Considering one of the case as an example, where we are using get_config_string, we do not pass the entire serde_json down to the parameters

...
impl InputPluginBuilder for InjectPluginBuilder {
    fn build(
        &self,
        parameters: &serde_json::Value,
    ) -> Result<Arc<dyn InputPlugin>, CompassConfigurationError> {
        let key = parameters.get_config_string(&"key", &"inject")?;
...

rather we have something like this:

{
        "type": "inject",
        "key": [
            12456
        ],
        "format": "toml",
        "value": {
            "a": 123456,
            "c": {
            "something": [
                1,
                2,
                3
            ]
            },
            "b": 126897
        }
}

So, when we query for path to key like let path = JsonPath::parse("$..key"); , we will get the location as : $['key'], since it doesn't have any information about the parent nodes.

Question 2: It seemed to me that we might need to pass the reference to the whole config. But I am not sure, if it would be a good idea to pass the reference to the whole config all the way down, so we can trace the path to key?

@iantei
Copy link
Collaborator Author

iantei commented Feb 17, 2025

@nreinicke @robfitzgerald some inputs on this would be appreciated

@robfitzgerald
Copy link
Collaborator

Question 1: Is this format of path to the key acceptable? Or should we modify it to something like plugin.input_plugins[0].key format?

the string $['plugin']['input_plugins'][0]['key'] is a more JavaScript (or DOM)-flavored way to represent these paths. i personally like the dot-delimited, it's easier to read, but, if it's difficult to find a way to write it that way, then this will suffice.

Question 2: It seemed to me that we might need to pass the reference to the whole config. But I am not sure, if it would be a good idea to pass the reference to the whole config all the way down, so we can trace the path to key?

you know @iantei, the custom JSON extensions you just refactored are actually fairly redundant to the built-in capabilities of serde_json and thiserror. if we were to start refactoring any further, i would want to talk about removing the ConfigJsonExtensions extensions class entirely, since i think that would also solve your problem here.

@iantei
Copy link
Collaborator Author

iantei commented Feb 17, 2025

the string $['plugin']['input_plugins'][0]['key'] is a more JavaScript (or DOM)-flavored way to represent these paths. i personally like the dot-delimited, it's easier to read, but, if it's difficult to find a way to write it that way, then this will suffice.

This is the default way in which the node.location() is returned as string. It should not be difficult to transform into dot-delimited format.

if we were to start refactoring any further, i would want to talk about removing the ConfigJsonExtensions extensions class entirely, since i think that would also solve your problem here.

I agree. If we refactor further, we would likely need not need functions like get_config_section(), which makes a point of removing the ConfigJsonExtensions extensions class altogether.

@robfitzgerald
Copy link
Collaborator

@iantei

making JSONPath a first-class citizen in Compass was also referenced in #193. if you could loop that in with your work i would appreciate it.

here's an example of how in a Compass configuration TOML file we can map fields in the Compass query response objects into CSV columns. this capability is implemented in routee_compass::app::compass::response::csv::CsvMapping (though i think there's a bug in how it deals with Option types). the general idea is that it is an ADT that describes whether a CSV field should be 1) mapped directly from some JSONPath, 2) if we should sum the values across a list of mappings, 3) if that value is optional. in that file, there is a function traverse that could be replaced by serde_json_path's get_path extension method.

pub enum CsvMapping {
    Path(String),
    Sum { sum: Vec<Box<CsvMapping>> },
    Optional { optional: Box<CsvMapping> },
}

from TOML, we can assign a mapping to CSV like this:

[[response_output_policy.policies]]
type = "file"
filename = "output-summary.csv"
[response_output_policy.policies.format]
type = "csv"
sorted = true
[response_output_policy.policies.format.mapping]
error = { optional = "error" }
id = { optional = "request.origin_name" }
name = { optional = "request.name" }
dlon = "request.destination_x"
dlat = "request.destination_y"
olon = "request.origin_x"
olat = "request.origin_y"
model_name = "request.model_name"
distance = "route.traversal_summary.distance"
time = "route.traversal_summary.time"
energy_liquid = { optional = "route.traversal_summary.energy_liquid" }
energy_electric = { optional = "route.traversal_summary.energy_electric" }
distance_cost = "route.cost.distance"
time_cost = "route.cost.time"
energy_liquid_cost = { optional = "route.cost.energy_liquid" }
energy_electric_cost = { optional = "route.cost.energy_electric" }
total_cost = "route.cost.total_cost"
distance_unit = "route.state_model.distance.distance_unit"
time_unit = "route.state_model.time.time_unit"
energy_liquid_unit = { optional = "route.state_model.energy_liquid.energy_unit" }
energy_electric_unit = { optional = "route.state_model.energy_electric.energy_unit" }
distance_weight = "request.weights.distance"
time_weight = "request.weights.time"
energy_liquid_weight = { optional = "request.weights.energy_liquid" }
energy_electric_weight = { optional = "request.weights.energy_electric" }
search_result_size_mib = "search_result_size_mib"

happy to jump on a call to talk it through!

@robfitzgerald
Copy link
Collaborator

after discussion with @iantei we agreed that a more strategic refactor needs to happen to add JSONPaths to error messages, as components in Compass do not have access to the full configuration file at the call site for errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants