Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom analysis API and InAndOut Implementation #88

Conversation

bhufmann
Copy link
Contributor

@bhufmann bhufmann commented Oct 1, 2024

Update (Oct 28, 0204):
All the PRs below as well as relevant Trace Compass mainline are merged. This PR only has the changes of the InAndOut implementation left.


This PR includes the TSP updates and concrete implementation for customizable analysis. The InAndOutAnalysis has been updated for that. This PR combines multiple commits, where some of them are available as separate PRs. It requires the commits of the following Trace Comapss mainline PR:

eclipse-tracecompass/org.eclipse.tracecompass#164

This PR combines the following commits/PRs:

Additional related changes:

Signed-off-by: Bernd Hufmann [email protected]

@bhufmann bhufmann requested a review from PatrickTasse October 1, 2024 14:08
@bhufmann bhufmann changed the title Output conifg factory impl inandout Custom analysis API and InAndOut Implementation Oct 1, 2024
@bhufmann bhufmann force-pushed the output-conifg-factory-impl-inandout branch from 12ae3cb to efa61cd Compare October 1, 2024 19:29
PatrickTasse
PatrickTasse previously approved these changes Oct 2, 2024
throw new TmfConfigurationException("JSON parameter is not deserialize correctly"); //$NON-NLS-1$
}
SegmentSpecifierConfiguration config = SegmentSpecifierConfiguration.fromJsonString(json);
fSpecifiers = config.getSpecifiers();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that instead of all the changes to pass JSON parameters and deserialize them, we could use the existing map of parameters like this:

    try {
        fSpecifiers = Lists.transform((List) configuration.getParameters().get("specifiers"), specifier -> {
            Map<String, Object> map = (Map) specifier;
            return new SegmentSpecifier(
                    (String) map.get("label"),
                    (String) map.get("inRegex"),
                    (String) map.get("outRegex"),
                    (String) map.get("contextInRegex"),
                    (String) map.get("contextOutRegex"),
                    (String) map.get("classifier"));
        });
        return true;
    } catch (ClassCastException e) {
        Activator.getInstance().logError("can't set analysis config" + e); //$NON-NLS-1$
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this example, it looks straight forward. In general, extension have to deserialize the JSON. If we pass an already deserialized JSON in map<String, Object> then extension needs to traverse the map (which could be map of maps>, instead of using deserialization using POJOs. This is a design choice. Not sure which one we should choose.

If we go with the Map approach it would be consistent with the rest of cases in the server. Validation would have to be manual in the extension and we cannot use a validation library for schema validation unless the map is serialized back to JSON and fed to the validator.

If we go with the string, then many existing code needs to be deprecated to remove 2 ways of configuring. With a string, the extension can handle the JSON string as required, POJO, parameter map. But this would be different way of working and serialization/deserialization of json.

I need to think about it a bit more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now we have DataProviderService deserializing the parameters into a JsonNode (in ConfigurationQueryParameters class). Then it converts the JsonNode to string in ITmfDataProviderSource.createDataProviderDescriptors(). Later the analysis module deserializes the parameters string into a SegmentSpecifierConfiguration.

What if we had just left the DataProviderService use the default to deserialize the parameters into a Map. Then nothing would prevent the extension (the analysis module) to still be able to deserialize the parameters into a SegmentSpecifierConfiguration. It just has to get the json string from this at line 89:

        String json = new Gson().toJson(configuration.getParameters());

So some extensions can optionally validate and deserialize the json string, when it's a complex object, and others can just use the map of parameters when it's a simple few values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now we have DataProviderService deserializing the parameters into a JsonNode (in ConfigurationQueryParameters class). Then it converts the JsonNode to string in ITmfDataProviderSource.createDataProviderDescriptors(). Later the analysis module deserializes the parameters string into a SegmentSpecifierConfiguration.

What if we had just left the DataProviderService use the default to deserialize the parameters into a Map. Then nothing would prevent the extension (the analysis module) to still be able to deserialize the parameters into a SegmentSpecifierConfiguration. It just has to get the json string from this at line 89:

        String json = new Gson().toJson(configuration.getParameters());

So some extensions can optionally validate and deserialize the json string, when it's a complex object, and others can just use the map of parameters when it's a simple few values.

I think you are right. We should pass a Map<String, Object> from the server towards ITmfDataProviderSource.createDataProviderDescriptors() (and ITmfConfigurationSource.create()) and not use string. This would make it consistent to other TSP endpoints, avoids multiple serialization/deserialization steps, avoids having to deprecate, manage 2 APIs during deprecation period and manage removal of old API (in global). Overall, the server API will be simplier.

Extension can use map lookups and if wanted convert it to a JSON string and feed it to a POJO and custom deserializer, as showed in your example. using Gson.

@abhinava-ericsson FYI, the API will use default Map<String, Object> for JSON objects. The pull requests will be updated for that.

@@ -50,23 +58,79 @@ public class InAndOutAnalysisModule extends InstrumentedCallStackAnalysis {
*/
public static final String JSON = ".config.json"; //$NON-NLS-1$

private @Nullable ITmfConfiguration fConfiguartion;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fConfiguration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

.setName("InAndOut config") //$NON-NLS-1$
.setDescription("Custom In And Out").build(); //$NON-NLS-1$

ITmfConfiguration config = TmfJsonConfiguration.fromJsonString(jsonParameters, defaultConfig);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I'm a bit confused. According to my understanding of TSP and OutputConfigurationQueryParameters, we should receive the typeId and parameters that could be anything but should comply with the configuration source schema.

So in TmfJsonConfiguration it shouldn´t expect to have id, name, description, sourceTypeId. It should only be the parameters.

Now in this example, the schema for In And Out contains properties: name, description, specifiers. Is this what is supposed to be in the parameters object? Then if name and description is used to populate ITmfConfiguration it is a specific implementation from In And Out, not generic to TmfJsonConfiguration. The sourceTypeId should come from the typeId parameter of this method. I'm not sure where the configuration id should come from in this example.

Edit: Ok I see now that sourceTypeId is expected to be taken from the default config, it's a bit sneaky.
If id, name, description are expected to (optionally) come outside of the parameters object, wouldn´t that need to be described in TSP?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I'm a bit confused. According to my understanding of TSP and OutputConfigurationQueryParameters, we should receive the typeId and parameters that could be anything but should comply with the configuration source schema.

Yes, the typeId needs to be supplied. Maybe it's better if the typeId is part of the arbitrary JSON parameters like the name and description parameters which are required items that each configuration should have. If not available the BE can assign the name and description, but the typeId is mandatory. I thought it was a good idea to move it out, but if think about it more it should be part the JSON object.

So in TmfJsonConfiguration it shouldn´t expect to have id, name, description, sourceTypeId. It should only be the parameters.

My idea was that the ID is generated to make it unique and not bother the client to type and find unique IDs. The unique is generated from the input JSON (incl. name, desc. and typeId) using a string representation of an UUID.

Now in this example, the schema for In And Out contains properties: name, description, specifiers. Is this what is supposed to be in the parameters object? Then if name and description is used to populate ITmfConfiguration it is a specific implementation from In And Out, not generic to TmfJsonConfiguration. The sourceTypeId should come from the typeId parameter of this method. I'm not sure where the configuration id should come from in this example.

Edit: Ok I see now that sourceTypeId is expected to be taken from the default config, it's a bit sneaky. If id, name, description are expected to (optionally) come outside of the parameters object, wouldn´t that need to be described in TSP?

This twisted my mind a bit. The JSON schema of the configuration source type was supposed to describe the extension specific parameters. However, the typeId, name and description are needed for the configuration instance. So, for name and description I thought of being it part of the schema, but the typeID was outside because it's needed to find the configuration source. But thinking about it, maybe typeId, name, description should be there always and the schema should define it (if they cannot be assigned automatically like name or description) for example, for XML analysis the name is the filename.

Do you agree that every configuration source should inlcude typeId, name and description as default parameters? If yes, the OutputConfigurationQueryParameters would only contain a JsonNode (or Map<String, Object>) depending what we decide to use.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intuition would be to define the mandatory/common configuration properties (id, name, description) outside so that they could all be described in the TSP, while leaving the parameters object as completely opaque and only defined by the specific extension in its schema's 'properties' element.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had that intuition too initially, some implication on the implementation on how the input json is generated in the client and how it can be validated in the server (using a validation library) If we put it outside we will need to get a separate JSON Object to feed the json validator. If the name/description is part of the schema, that JSON object has to include it. If not, the schema can't have it. Both solutions are valid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with both solutions as long as it's clear in TSP what needs to be included in the map.

How about if the map contains 'name', 'id', 'description' and 'properties' where the value of properties is an object that matches the properties in the JSON schema? I think it would be less risky than having the application-specific elements at the same root level as the mandatory configuration elements, to avoid key name conflict.

@abhinava-ericsson
Copy link
Contributor

abhinava-ericsson commented Oct 4, 2024

InAndOutAnalysis feels to be in conflict with the concept of a configurable Data Provider.
Ideally, if we were to apply the concept afresh, InAndOut would be a config type for each of the 4 data providers.

Or, if we were to support configuration at the experiment level, it would again be a config type, which could be applied to all analysis/ data providers supported by that experiment.

*/
@NonNull
@Schema(required = true, description = "Parameters as specified in corresponding ConfigurationTypeDescriptor. Use key `path` for file URI string values.")
Map<String, Object> getParameters();
@Schema(required = true, description = "Parameters as JSON object as specified in the schema of the corresponding ConfigurationTypeDescriptor or as a Map<String, Object>. Use key `path` for file URI string values.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this file is only used to create the TSP API from the swagger model. In TSP, won't the parameters always be passed as map object by the client, which is then transported as a JSON string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this file is only used for swagger documentation.

An Object resolves to a JsonObject in generated swagger documentation as far as I understand.

The client will have to put a map<String, Object> where the key is parameters and the value the actual input .JSON object for this configuration (see below for an example). I kept the parameters map because it is consistent to all the other endpoints. What do you suggest to do it differently? I mean in general in the implementation as well as for the swagger documention.

{
    "parameters": {
        "name": "my InAndOut",
        "description" : "my description",
        "specifiers" : [
            {
            "label":"latency",
            "inRegex":"(\\S*)_entry",
            "outRegex":"(\\S*)_exit",
            "contextInRegex":"(\\S*)_entry",
            "contextOutRegex":"(\\S*)_exit",
            "classifier":"CPU"
            }
        ]
    },
    "typeId": "org.eclipse.tracecompass.incubator.internal.inandout.core.config"
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess what I'm understanding is that at TSP protocol (swagger) level, we don't need to make the difference, the example you give fits both. So we could just describe it here by modifying the original to add 'as specified in the schema of...'. It's not clear to me how the sender would differentiate between sending a JSON object or a Map, or how it would look different over the TSP, so I'm not sure about describing it in the protocol as a choice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A JSON object in swagger boils done to a map if it is deserialized as a map with default deserializer.

@SuppressWarnings("nls")
@Override
public String toString() {
return "QueryParameters [parameters=" + parameters + ", json=" + parameters.toString() + "]";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of these should print the same string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I need to update this. I had the class implemented differently and the toString was not updated.

@bhufmann
Copy link
Contributor Author

bhufmann commented Oct 7, 2024

InAndOutAnalysis feels to be in conflict with the concept of a configurable Data Provider. Ideally, if we were to apply the concept afresh, InAndOut would be a config type for each of the 4 data providers.

Why? A given data provider could configure the creation of one or more other data providers. It could even configure data providers with a different provider type. For example, the Linux ThreadStatusDataProvider can spawn a Critical Path data provider. The InAndOutDataProvider actually doesn't have a provider type that has a view type associated with. Its sole purpose is to configure and manage InAndOutAnalysis that spwans multiple views (flamechart, statistics, scatter chart etc.) with the same configuration.

Or, if we were to support configuration at the experiment level, it would again be a config type, which could be applied to all analysis/ data providers supported by that experiment.

That was my initial design proposal, to allow configuration on the experiment level which can spawn custom data providers based on an input configuration. See here (https://github.com/eclipse-cdt-cloud/theia-trace-extension/blob/e5453d1d1439262657e87a4563cc55582fd95aca/doc/adr/0011-tsp-analysis-api.md#configuration-service-per-experiment). It would support the InAndOutAnalysis use case as well as the per data provider design. Both ways are valid and have pros and cons. I think with the per data provider solution make it harder to implement sharing of configurations, but that was not a requirement anymore at this moment.

I went with the per data provider implementation for InAndOut to have an open-source reference example that downstream projects base their implementation on. It also verifies all the added APIs.

Are you suggesting going back to the original design? Now it would be the right time before the whole design gets merged.

@abhinava-ericsson
Copy link
Contributor

InAndOutAnalysis feels to be in conflict with the concept of a configurable Data Provider. Ideally, if we were to apply the concept afresh, InAndOut would be a config type for each of the 4 data providers.

Why? A given data provider could configure the creation of one or more other data providers. It could even configure data providers with a different provider type. For example, the Linux ThreadStatusDataProvider can spawn a Critical Path data provider. The InAndOutDataProvider actually doesn't have a provider type that has a view type associated with. Its sole purpose is to configure and manage InAndOutAnalysis that spwans multiple views (flamechart, statistics, scatter chart etc.) with the same configuration.

My point was that all "InAndOut" does is that it configures the multiple analyses/ views (flamechart, statistics, scatter chart etc.). That's why there is no base InAndOutDataProvider. So, it feels more like a ConfigurationType than an analyses. It can obviously be made to fit (as it has been in this patch).

Are you suggesting going back to the original design? Now it would be the right time before the whole design gets merged.

No, I think the per DataProvider configuration is useful. But, later we can also support a per experiment configuration, which would configure all data providers the experiment supports. I said that because InAndOut analyses seems to do two things:

  • Club multiple analyses together
  • Configures each of those analyses

try {
ITmfConfiguration config = configurationSource.create(params);
ITmfConfiguration config = configurationSource.create(queryParameters.getParameters().toString());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a new API in the configuration source for global configurations. The other api is to pass a map<String, Object> which means the server deserializes the JSON in that map. Since it has no specific deserializer available because it's defined by the configuration source in TC and not known in the server part of the code, it will do a default serialization and it will end up in map of maps etc. The keys will have the same name than the ones in the schema.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API will be changed to use Map<String, Object> instead of json string (see discussion for reasoning: #88 (comment))

@bhufmann
Copy link
Contributor Author

bhufmann commented Oct 7, 2024

InAndOutAnalysis feels to be in conflict with the concept of a configurable Data Provider. Ideally, if we were to apply the concept afresh, InAndOut would be a config type for each of the 4 data providers.

Why? A given data provider could configure the creation of one or more other data providers. It could even configure data providers with a different provider type. For example, the Linux ThreadStatusDataProvider can spawn a Critical Path data provider. The InAndOutDataProvider actually doesn't have a provider type that has a view type associated with. Its sole purpose is to configure and manage InAndOutAnalysis that spwans multiple views (flamechart, statistics, scatter chart etc.) with the same configuration.

My point was that all "InAndOut" does is that it configures the multiple analyses/ views (flamechart, statistics, scatter chart etc.). That's why there is no base InAndOutDataProvider. So, it feels more like a ConfigurationType than an analyses. It can obviously be made to fit (as it has been in this patch).

Are you suggesting going back to the original design? Now it would be the right time before the whole design gets merged.

No, I think the per DataProvider configuration is useful. But, later we can also support a per experiment configuration, which would configure all data providers the experiment supports. I said that because InAndOut analyses seems to do two things:

  • Club multiple analyses together
  • Configures each of those analyses

Only one analysis is configured that feeds multiple views, not multiple analyses.

"properties": {
"label": {
"type": "string",
"descritpion": "The label of the identifier"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A useful optional field here (apart from type and description) would be a list of values to select elements from. This could be used to provide an option to the user to select instead of type in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we have to follow the JSON schema specification to see what is possible (https://json-schema.org/). The designer of the schema and extension will have look into that. I know, that you, for example, can specify a regex pattern to limit the input.

@bhufmann
Copy link
Contributor Author

bhufmann commented Oct 8, 2024

InAndOutAnalysis feels to be in conflict with the concept of a configurable Data Provider. Ideally, if we were to apply the concept afresh, InAndOut would be a config type for each of the 4 data providers.

Why? A given data provider could configure the creation of one or more other data providers. It could even configure data providers with a different provider type. For example, the Linux ThreadStatusDataProvider can spawn a Critical Path data provider. The InAndOutDataProvider actually doesn't have a provider type that has a view type associated with. Its sole purpose is to configure and manage InAndOutAnalysis that spwans multiple views (flamechart, statistics, scatter chart etc.) with the same configuration.

My point was that all "InAndOut" does is that it configures the multiple analyses/ views (flamechart, statistics, scatter chart etc.). That's why there is no base InAndOutDataProvider. So, it feels more like a ConfigurationType than an analyses. It can obviously be made to fit (as it has been in this patch).

Are you suggesting going back to the original design? Now it would be the right time before the whole design gets merged.

No, I think the per DataProvider configuration is useful. But, later we can also support a per experiment configuration, which would configure all data providers the experiment supports. I said that because InAndOut analyses seems to do two things:

  • Club multiple analyses together
  • Configures each of those analyses

Only one analysis is configured that feeds multiple views, not multiple analyses.

@abhinava-ericsson, @PatrickTasse
I realized that for InAndOut analysis the remove method is not conceptionally correct and will be confusing for the end user. What mean is that there is one configuration that spawns multiple data providers (which is a valid use case), but when removing user would have to use of the list of data providers to remove the configuration and hence he rest of the data providers. For that use case we will need to have a separate interface (and endpoint) to remove the configuration which then removes the associated data providers. That would be much cleaner:

ITmfDataProviderSource.remove(ITmfTrace, String configID)

The endpoint could be as followed (or similar)

/experiments/{expUUID}/outputs/{outputId}/configs/{configId}

* @return status and the deleted configuration instance, if successful
*/
@DELETE
@Path("/{outputId}/{derivedOutputId}")
Copy link
Contributor Author

@bhufmann bhufmann Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we change to the following? This would require that the derived output know how to delete it.

@Path("/{derivedOutputId}/")

As proposed earlier (#88 (comment)) to have a way to remove a config from the data provider instance which will remove the actual data provider(s) consequently.

@abhinava-ericsson @PatrickTasse WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say @Path("/{outputId}/{derivedOutputId}") is cleaner. In general, it sounds like a nice idea to have all url references to derived output ids under a base output id.

@Tag(name = OCG)
@Consumes(MediaType.APPLICATION_JSON)
@Produces(MediaType.APPLICATION_JSON)
@Operation(summary = "Get the list of outputs for this configuration", responses = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs update

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@bhufmann
Copy link
Contributor Author

InAndOutAnalysis feels to be in conflict with the concept of a configurable Data Provider. Ideally, if we were to apply the concept afresh, InAndOut would be a config type for each of the 4 data providers.

Why? A given data provider could configure the creation of one or more other data providers. It could even configure data providers with a different provider type. For example, the Linux ThreadStatusDataProvider can spawn a Critical Path data provider. The InAndOutDataProvider actually doesn't have a provider type that has a view type associated with. Its sole purpose is to configure and manage InAndOutAnalysis that spwans multiple views (flamechart, statistics, scatter chart etc.) with the same configuration.

My point was that all "InAndOut" does is that it configures the multiple analyses/ views (flamechart, statistics, scatter chart etc.). That's why there is no base InAndOutDataProvider. So, it feels more like a ConfigurationType than an analyses. It can obviously be made to fit (as it has been in this patch).

Are you suggesting going back to the original design? Now it would be the right time before the whole design gets merged.

No, I think the per DataProvider configuration is useful. But, later we can also support a per experiment configuration, which would configure all data providers the experiment supports. I said that because InAndOut analyses seems to do two things:

  • Club multiple analyses together
  • Configures each of those analyses

Only one analysis is configured that feeds multiple views, not multiple analyses.

@abhinava-ericsson, @PatrickTasse I realized that for InAndOut analysis the remove method is not conceptionally correct and will be confusing for the end user. What mean is that there is one configuration that spawns multiple data providers (which is a valid use case), but when removing user would have to use of the list of data providers to remove the configuration and hence he rest of the data providers. For that use case we will need to have a separate interface (and endpoint) to remove the configuration which then removes the associated data providers. That would be much cleaner:

ITmfDataProviderSource.remove(ITmfTrace, String configID)

The endpoint could be as followed (or similar)

/experiments/{expUUID}/outputs/{outputId}/configs/{configId}

@abhinava-ericsson I think there is a solution to make it clearer. @PatrickTasse suggested to return only one data provider descriptor when posting the config. This resulting data provider would be a derived data provider, which may have a graph associated with (e.g. timegraph) or has itself children data providers used for the InAndOut use case. The end user will always use the single derived data provider for removal. In case the derived data provider has children it will also remove the children.

With this we will always have a 1:1 relationship between configuration and derived data provider and the end-user will manage the derived data provider. We don't a separate code path and endpoint anymore as I earlier suggested and it will make things easier.

For the InAndOutAnalysis the hierarchy will look like the following:

--- InAndOut Configurator
    |--- Derived InAndOut (A)
        |--- InAndOut (A) - Flame Chart
        |--- InAndOut (A) - Latency Statistics 
        |--- InAndOut (A) - Latency Table
        |--- InAndOut (A) - Latency vs Time
    |--- Derived InAndOut (B)
        |--- InAndOut (B) - Flame Chart
        |--- InAndOut (B) - Latency Statistics 
        |--- InAndOut (B) - Latency Table
        |--- InAndOut (B) - Latency vs Time

The end-user will use the InAndOut Configurator to create a new InAndOut analysis (e.g.InAndOut (A)) and used Derived InAndOut (A) to remove it. As a nice side effect there will be automatic grouping and which can be visualized in the FE clients.

The API in ITmfDataProviderSource will look like this:

IDataProviderDescriptor create(String typeID, ITmfTrace trace, Map<String, Object> parameters);
void ITmfDataProviderSource.remove(ITmfTrace trace, IDataProviderDescriptor derivedDescriptor);

WDYT?

@abhinava-ericsson
Copy link
Contributor

InAndOutAnalysis feels to be in conflict with the concept of a configurable Data Provider. Ideally, if we were to apply the concept afresh, InAndOut would be a config type for each of the 4 data providers.

Why? A given data provider could configure the creation of one or more other data providers. It could even configure data providers with a different provider type. For example, the Linux ThreadStatusDataProvider can spawn a Critical Path data provider. The InAndOutDataProvider actually doesn't have a provider type that has a view type associated with. Its sole purpose is to configure and manage InAndOutAnalysis that spwans multiple views (flamechart, statistics, scatter chart etc.) with the same configuration.

My point was that all "InAndOut" does is that it configures the multiple analyses/ views (flamechart, statistics, scatter chart etc.). That's why there is no base InAndOutDataProvider. So, it feels more like a ConfigurationType than an analyses. It can obviously be made to fit (as it has been in this patch).

Are you suggesting going back to the original design? Now it would be the right time before the whole design gets merged.

No, I think the per DataProvider configuration is useful. But, later we can also support a per experiment configuration, which would configure all data providers the experiment supports. I said that because InAndOut analyses seems to do two things:

  • Club multiple analyses together
  • Configures each of those analyses

Only one analysis is configured that feeds multiple views, not multiple analyses.

@abhinava-ericsson, @PatrickTasse I realized that for InAndOut analysis the remove method is not conceptionally correct and will be confusing for the end user. What mean is that there is one configuration that spawns multiple data providers (which is a valid use case), but when removing user would have to use of the list of data providers to remove the configuration and hence he rest of the data providers. For that use case we will need to have a separate interface (and endpoint) to remove the configuration which then removes the associated data providers. That would be much cleaner:

ITmfDataProviderSource.remove(ITmfTrace, String configID)

The endpoint could be as followed (or similar)

/experiments/{expUUID}/outputs/{outputId}/configs/{configId}

@abhinava-ericsson I think there is a solution to make it clearer. @PatrickTasse suggested to return only one data provider descriptor when posting the config. This resulting data provider would be a derived data provider, which may have a graph associated with (e.g. timegraph) or has itself children data providers used for the InAndOut use case. The end user will always use the single derived data provider for removal. In case the derived data provider has children it will also remove the children.

With this we will always have a 1:1 relationship between configuration and derived data provider and the end-user will manage the derived data provider. We don't a separate code path and endpoint anymore as I earlier suggested and it will make things easier.

For the InAndOutAnalysis the hierarchy will look like the following:

--- InAndOut Configurator
    |--- Derived InAndOut (A)
        |--- InAndOut (A) - Flame Chart
        |--- InAndOut (A) - Latency Statistics 
        |--- InAndOut (A) - Latency Table
        |--- InAndOut (A) - Latency vs Time
    |--- Derived InAndOut (B)
        |--- InAndOut (B) - Flame Chart
        |--- InAndOut (B) - Latency Statistics 
        |--- InAndOut (B) - Latency Table
        |--- InAndOut (B) - Latency vs Time

The end-user will use the InAndOut Configurator to create a new InAndOut analysis (e.g.InAndOut (A)) and used Derived InAndOut (A) to remove it. As a nice side effect there will be automatic grouping and which can be visualized in the FE clients.

The API in ITmfDataProviderSource will look like this:

IDataProviderDescriptor create(String typeID, ITmfTrace trace, Map<String, Object> parameters);
void ITmfDataProviderSource.remove(ITmfTrace trace, IDataProviderDescriptor derivedDescriptor);

WDYT?

Yes. This definitely sounds like a nicer solution.

@abhinava-ericsson
Copy link
Contributor

abhinava-ericsson commented Oct 10, 2024

InAndOutAnalysis feels to be in conflict with the concept of a configurable Data Provider. Ideally, if we were to apply the concept afresh, InAndOut would be a config type for each of the 4 data providers.

Why? A given data provider could configure the creation of one or more other data providers. It could even configure data providers with a different provider type. For example, the Linux ThreadStatusDataProvider can spawn a Critical Path data provider. The InAndOutDataProvider actually doesn't have a provider type that has a view type associated with. Its sole purpose is to configure and manage InAndOutAnalysis that spwans multiple views (flamechart, statistics, scatter chart etc.) with the same configuration.

My point was that all "InAndOut" does is that it configures the multiple analyses/ views (flamechart, statistics, scatter chart etc.). That's why there is no base InAndOutDataProvider. So, it feels more like a ConfigurationType than an analyses. It can obviously be made to fit (as it has been in this patch).

Are you suggesting going back to the original design? Now it would be the right time before the whole design gets merged.

No, I think the per DataProvider configuration is useful. But, later we can also support a per experiment configuration, which would configure all data providers the experiment supports. I said that because InAndOut analyses seems to do two things:

  • Club multiple analyses together
  • Configures each of those analyses

Only one analysis is configured that feeds multiple views, not multiple analyses.

@abhinava-ericsson, @PatrickTasse I realized that for InAndOut analysis the remove method is not conceptionally correct and will be confusing for the end user. What mean is that there is one configuration that spawns multiple data providers (which is a valid use case), but when removing user would have to use of the list of data providers to remove the configuration and hence he rest of the data providers. For that use case we will need to have a separate interface (and endpoint) to remove the configuration which then removes the associated data providers. That would be much cleaner:

ITmfDataProviderSource.remove(ITmfTrace, String configID)

The endpoint could be as followed (or similar)

/experiments/{expUUID}/outputs/{outputId}/configs/{configId}

@abhinava-ericsson I think there is a solution to make it clearer. @PatrickTasse suggested to return only one data provider descriptor when posting the config. This resulting data provider would be a derived data provider, which may have a graph associated with (e.g. timegraph) or has itself children data providers used for the InAndOut use case. The end user will always use the single derived data provider for removal. In case the derived data provider has children it will also remove the children.
With this we will always have a 1:1 relationship between configuration and derived data provider and the end-user will manage the derived data provider. We don't a separate code path and endpoint anymore as I earlier suggested and it will make things easier.
For the InAndOutAnalysis the hierarchy will look like the following:

--- InAndOut Configurator
    |--- Derived InAndOut (A)
        |--- InAndOut (A) - Flame Chart
        |--- InAndOut (A) - Latency Statistics 
        |--- InAndOut (A) - Latency Table
        |--- InAndOut (A) - Latency vs Time
    |--- Derived InAndOut (B)
        |--- InAndOut (B) - Flame Chart
        |--- InAndOut (B) - Latency Statistics 
        |--- InAndOut (B) - Latency Table
        |--- InAndOut (B) - Latency vs Time

The end-user will use the InAndOut Configurator to create a new InAndOut analysis (e.g.InAndOut (A)) and used Derived InAndOut (A) to remove it. As a nice side effect there will be automatic grouping and which can be visualized in the FE clients.
The API in ITmfDataProviderSource will look like this:

IDataProviderDescriptor create(String typeID, ITmfTrace trace, Map<String, Object> parameters);
void ITmfDataProviderSource.remove(ITmfTrace trace, IDataProviderDescriptor derivedDescriptor);

WDYT?

Yes. This definitely sounds like a nicer solution.

One thing I would say here is that in effect, we are introducing a notion of sub-dataproviders, which is slightly different than derived data providers. We may need to handle it accordingly.

@bhufmann bhufmann force-pushed the output-conifg-factory-impl-inandout branch 2 times, most recently from 363c5d0 to f4bc9b9 Compare October 30, 2024 17:40
@bhufmann bhufmann changed the title Custom analysis API and InAndOut Implementation InAndOut implementation using custom analysis API Oct 30, 2024
@bhufmann bhufmann changed the title InAndOut implementation using custom analysis API Custom analysis API implemenation with InAndOut Analysis Oct 30, 2024
@bhufmann bhufmann changed the title Custom analysis API implemenation with InAndOut Analysis Custom analysis API and InAndOut Implementation Oct 30, 2024
@bhufmann bhufmann requested a review from PatrickTasse October 30, 2024 23:52
Note:
- Used through trace server
- Multiple instances allowed
- configuration is applied to all traces in experiment
- configuration is not applied to experiment
- Eclipse is using legacy InAndOut analysis config which allows one
configuration per trace only and which integrated in the Eclipse UI

[Added] InAndOut Analysis using ITmfDataProviderConfigurator extension

Signed-off-by: Bernd Hufmann <[email protected]>
@bhufmann bhufmann force-pushed the output-conifg-factory-impl-inandout branch from f4bc9b9 to 6aab2cb Compare November 4, 2024 20:46
@bhufmann bhufmann merged commit 794ca85 into eclipse-tracecompass-incubator:master Nov 4, 2024
2 checks passed
@bhufmann bhufmann deleted the output-conifg-factory-impl-inandout branch November 4, 2024 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants