-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom analysis API and InAndOut Implementation #88
Custom analysis API and InAndOut Implementation #88
Conversation
12ae3cb
to
efa61cd
Compare
...clipse/tracecompass/incubator/internal/inandout/core/config/InAndOutConfigurationSource.java
Outdated
Show resolved
Hide resolved
...out.core/src/org/eclipse/tracecompass/incubator/internal/inandout/core/config/JsonUtils.java
Outdated
Show resolved
Hide resolved
....core/src/org/eclipse/tracecompass/incubator/internal/inandout/core/config/package-info.java
Outdated
Show resolved
Hide resolved
throw new TmfConfigurationException("JSON parameter is not deserialize correctly"); //$NON-NLS-1$ | ||
} | ||
SegmentSpecifierConfiguration config = SegmentSpecifierConfiguration.fromJsonString(json); | ||
fSpecifiers = config.getSpecifiers(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that instead of all the changes to pass JSON parameters and deserialize them, we could use the existing map of parameters like this:
try {
fSpecifiers = Lists.transform((List) configuration.getParameters().get("specifiers"), specifier -> {
Map<String, Object> map = (Map) specifier;
return new SegmentSpecifier(
(String) map.get("label"),
(String) map.get("inRegex"),
(String) map.get("outRegex"),
(String) map.get("contextInRegex"),
(String) map.get("contextOutRegex"),
(String) map.get("classifier"));
});
return true;
} catch (ClassCastException e) {
Activator.getInstance().logError("can't set analysis config" + e); //$NON-NLS-1$
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this example, it looks straight forward. In general, extension have to deserialize the JSON. If we pass an already deserialized JSON in map<String, Object> then extension needs to traverse the map (which could be map of maps>, instead of using deserialization using POJOs. This is a design choice. Not sure which one we should choose.
If we go with the Map approach it would be consistent with the rest of cases in the server. Validation would have to be manual in the extension and we cannot use a validation library for schema validation unless the map is serialized back to JSON and fed to the validator.
If we go with the string, then many existing code needs to be deprecated to remove 2 ways of configuring. With a string, the extension can handle the JSON string as required, POJO, parameter map. But this would be different way of working and serialization/deserialization of json.
I need to think about it a bit more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now we have DataProviderService deserializing the parameters into a JsonNode (in ConfigurationQueryParameters class). Then it converts the JsonNode to string in ITmfDataProviderSource.createDataProviderDescriptors(). Later the analysis module deserializes the parameters string into a SegmentSpecifierConfiguration.
What if we had just left the DataProviderService use the default to deserialize the parameters into a Map. Then nothing would prevent the extension (the analysis module) to still be able to deserialize the parameters into a SegmentSpecifierConfiguration. It just has to get the json string from this at line 89:
String json = new Gson().toJson(configuration.getParameters());
So some extensions can optionally validate and deserialize the json string, when it's a complex object, and others can just use the map of parameters when it's a simple few values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now we have DataProviderService deserializing the parameters into a JsonNode (in ConfigurationQueryParameters class). Then it converts the JsonNode to string in ITmfDataProviderSource.createDataProviderDescriptors(). Later the analysis module deserializes the parameters string into a SegmentSpecifierConfiguration.
What if we had just left the DataProviderService use the default to deserialize the parameters into a Map. Then nothing would prevent the extension (the analysis module) to still be able to deserialize the parameters into a SegmentSpecifierConfiguration. It just has to get the json string from this at line 89:
String json = new Gson().toJson(configuration.getParameters());
So some extensions can optionally validate and deserialize the json string, when it's a complex object, and others can just use the map of parameters when it's a simple few values.
I think you are right. We should pass a Map<String, Object> from the server towards ITmfDataProviderSource.createDataProviderDescriptors() (and ITmfConfigurationSource.create()) and not use string. This would make it consistent to other TSP endpoints, avoids multiple serialization/deserialization steps, avoids having to deprecate, manage 2 APIs during deprecation period and manage removal of old API (in global). Overall, the server API will be simplier.
Extension can use map lookups and if wanted convert it to a JSON string and feed it to a POJO and custom deserializer, as showed in your example. using Gson.
@abhinava-ericsson FYI, the API will use default Map<String, Object> for JSON objects. The pull requests will be updated for that.
@@ -50,23 +58,79 @@ public class InAndOutAnalysisModule extends InstrumentedCallStackAnalysis { | |||
*/ | |||
public static final String JSON = ".config.json"; //$NON-NLS-1$ | |||
|
|||
private @Nullable ITmfConfiguration fConfiguartion; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fConfiguration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
.setName("InAndOut config") //$NON-NLS-1$ | ||
.setDescription("Custom In And Out").build(); //$NON-NLS-1$ | ||
|
||
ITmfConfiguration config = TmfJsonConfiguration.fromJsonString(jsonParameters, defaultConfig); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I'm a bit confused. According to my understanding of TSP and OutputConfigurationQueryParameters, we should receive the typeId and parameters that could be anything but should comply with the configuration source schema.
So in TmfJsonConfiguration it shouldn´t expect to have id, name, description, sourceTypeId. It should only be the parameters.
Now in this example, the schema for In And Out contains properties: name, description, specifiers. Is this what is supposed to be in the parameters object? Then if name and description is used to populate ITmfConfiguration it is a specific implementation from In And Out, not generic to TmfJsonConfiguration. The sourceTypeId should come from the typeId parameter of this method. I'm not sure where the configuration id should come from in this example.
Edit: Ok I see now that sourceTypeId is expected to be taken from the default config, it's a bit sneaky.
If id, name, description are expected to (optionally) come outside of the parameters object, wouldn´t that need to be described in TSP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I'm a bit confused. According to my understanding of TSP and OutputConfigurationQueryParameters, we should receive the typeId and parameters that could be anything but should comply with the configuration source schema.
Yes, the typeId needs to be supplied. Maybe it's better if the typeId is part of the arbitrary JSON parameters like the name and description parameters which are required items that each configuration should have. If not available the BE can assign the name and description, but the typeId is mandatory. I thought it was a good idea to move it out, but if think about it more it should be part the JSON object.
So in TmfJsonConfiguration it shouldn´t expect to have id, name, description, sourceTypeId. It should only be the parameters.
My idea was that the ID is generated to make it unique and not bother the client to type and find unique IDs. The unique is generated from the input JSON (incl. name, desc. and typeId) using a string representation of an UUID.
Now in this example, the schema for In And Out contains properties: name, description, specifiers. Is this what is supposed to be in the parameters object? Then if name and description is used to populate ITmfConfiguration it is a specific implementation from In And Out, not generic to TmfJsonConfiguration. The sourceTypeId should come from the typeId parameter of this method. I'm not sure where the configuration id should come from in this example.
Edit: Ok I see now that sourceTypeId is expected to be taken from the default config, it's a bit sneaky. If id, name, description are expected to (optionally) come outside of the parameters object, wouldn´t that need to be described in TSP?
This twisted my mind a bit. The JSON schema of the configuration source type was supposed to describe the extension specific parameters. However, the typeId, name and description are needed for the configuration instance. So, for name and description I thought of being it part of the schema, but the typeID was outside because it's needed to find the configuration source. But thinking about it, maybe typeId, name, description should be there always and the schema should define it (if they cannot be assigned automatically like name or description) for example, for XML analysis the name is the filename.
Do you agree that every configuration source should inlcude typeId, name and description as default parameters? If yes, the OutputConfigurationQueryParameters would only contain a JsonNode (or Map<String, Object>) depending what we decide to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intuition would be to define the mandatory/common configuration properties (id, name, description) outside so that they could all be described in the TSP, while leaving the parameters object as completely opaque and only defined by the specific extension in its schema's 'properties' element.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had that intuition too initially, some implication on the implementation on how the input json is generated in the client and how it can be validated in the server (using a validation library) If we put it outside we will need to get a separate JSON Object to feed the json validator. If the name/description is part of the schema, that JSON object has to include it. If not, the schema can't have it. Both solutions are valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with both solutions as long as it's clear in TSP what needs to be included in the map.
How about if the map contains 'name', 'id', 'description' and 'properties' where the value of properties is an object that matches the properties in the JSON schema? I think it would be less risky than having the application-specific elements at the same root level as the mandatory configuration elements, to avoid key name conflict.
InAndOutAnalysis feels to be in conflict with the concept of a configurable Data Provider. Or, if we were to support configuration at the experiment level, it would again be a config type, which could be applied to all analysis/ data providers supported by that experiment. |
*/ | ||
@NonNull | ||
@Schema(required = true, description = "Parameters as specified in corresponding ConfigurationTypeDescriptor. Use key `path` for file URI string values.") | ||
Map<String, Object> getParameters(); | ||
@Schema(required = true, description = "Parameters as JSON object as specified in the schema of the corresponding ConfigurationTypeDescriptor or as a Map<String, Object>. Use key `path` for file URI string values.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, this file is only used to create the TSP API from the swagger model. In TSP, won't the parameters always be passed as map object by the client, which is then transported as a JSON string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this file is only used for swagger documentation.
An Object resolves to a JsonObject in generated swagger documentation as far as I understand.
The client will have to put a map<String, Object> where the key is parameters
and the value the actual input .JSON object for this configuration (see below for an example). I kept the parameters map because it is consistent to all the other endpoints. What do you suggest to do it differently? I mean in general in the implementation as well as for the swagger documention.
{
"parameters": {
"name": "my InAndOut",
"description" : "my description",
"specifiers" : [
{
"label":"latency",
"inRegex":"(\\S*)_entry",
"outRegex":"(\\S*)_exit",
"contextInRegex":"(\\S*)_entry",
"contextOutRegex":"(\\S*)_exit",
"classifier":"CPU"
}
]
},
"typeId": "org.eclipse.tracecompass.incubator.internal.inandout.core.config"
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess what I'm understanding is that at TSP protocol (swagger) level, we don't need to make the difference, the example you give fits both. So we could just describe it here by modifying the original to add 'as specified in the schema of...'. It's not clear to me how the sender would differentiate between sending a JSON object or a Map, or how it would look different over the TSP, so I'm not sure about describing it in the protocol as a choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A JSON object in swagger boils done to a map if it is deserialized as a map with default deserializer.
@SuppressWarnings("nls") | ||
@Override | ||
public String toString() { | ||
return "QueryParameters [parameters=" + parameters + ", json=" + parameters.toString() + "]"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both of these should print the same string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I need to update this. I had the class implemented differently and the toString was not updated.
Why? A given data provider could configure the creation of one or more other data providers. It could even configure data providers with a different provider type. For example, the Linux ThreadStatusDataProvider can spawn a Critical Path data provider. The InAndOutDataProvider actually doesn't have a provider type that has a view type associated with. Its sole purpose is to configure and manage InAndOutAnalysis that spwans multiple views (flamechart, statistics, scatter chart etc.) with the same configuration.
That was my initial design proposal, to allow configuration on the experiment level which can spawn custom data providers based on an input configuration. See here (https://github.com/eclipse-cdt-cloud/theia-trace-extension/blob/e5453d1d1439262657e87a4563cc55582fd95aca/doc/adr/0011-tsp-analysis-api.md#configuration-service-per-experiment). It would support the InAndOutAnalysis use case as well as the per data provider design. Both ways are valid and have pros and cons. I think with the per data provider solution make it harder to implement sharing of configurations, but that was not a requirement anymore at this moment. I went with the per data provider implementation for InAndOut to have an open-source reference example that downstream projects base their implementation on. It also verifies all the added APIs. Are you suggesting going back to the original design? Now it would be the right time before the whole design gets merged. |
My point was that all "InAndOut" does is that it configures the multiple analyses/ views (flamechart, statistics, scatter chart etc.). That's why there is no base InAndOutDataProvider. So, it feels more like a ConfigurationType than an analyses. It can obviously be made to fit (as it has been in this patch).
No, I think the per DataProvider configuration is useful. But, later we can also support a per experiment configuration, which would configure all data providers the experiment supports. I said that because InAndOut analyses seems to do two things:
|
try { | ||
ITmfConfiguration config = configurationSource.create(params); | ||
ITmfConfiguration config = configurationSource.create(queryParameters.getParameters().toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a new API in the configuration source for global configurations. The other api is to pass a map<String, Object> which means the server deserializes the JSON in that map. Since it has no specific deserializer available because it's defined by the configuration source in TC and not known in the server part of the code, it will do a default serialization and it will end up in map of maps etc. The keys will have the same name than the ones in the schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
API will be changed to use Map<String, Object> instead of json string (see discussion for reasoning: #88 (comment))
Only one analysis is configured that feeds multiple views, not multiple analyses. |
"properties": { | ||
"label": { | ||
"type": "string", | ||
"descritpion": "The label of the identifier" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A useful optional field here (apart from type and description) would be a list of values to select elements from. This could be used to provide an option to the user to select instead of type in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we have to follow the JSON schema specification to see what is possible (https://json-schema.org/). The designer of the schema and extension will have look into that. I know, that you, for example, can specify a regex pattern to limit the input.
@abhinava-ericsson, @PatrickTasse
The endpoint could be as followed (or similar)
|
* @return status and the deleted configuration instance, if successful | ||
*/ | ||
@DELETE | ||
@Path("/{outputId}/{derivedOutputId}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we change to the following? This would require that the derived output know how to delete it.
@Path("/{derivedOutputId}/")
As proposed earlier (#88 (comment)) to have a way to remove a config from the data provider instance which will remove the actual data provider(s) consequently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say @Path("/{outputId}/{derivedOutputId}")
is cleaner. In general, it sounds like a nice idea to have all url references to derived output ids under a base output id.
@Tag(name = OCG) | ||
@Consumes(MediaType.APPLICATION_JSON) | ||
@Produces(MediaType.APPLICATION_JSON) | ||
@Operation(summary = "Get the list of outputs for this configuration", responses = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@abhinava-ericsson I think there is a solution to make it clearer. @PatrickTasse suggested to return only one data provider descriptor when posting the config. This resulting data provider would be a derived data provider, which may have a graph associated with (e.g. timegraph) or has itself children data providers used for the InAndOut use case. The end user will always use the single derived data provider for removal. In case the derived data provider has children it will also remove the children. With this we will always have a 1:1 relationship between configuration and derived data provider and the end-user will manage the derived data provider. We don't a separate code path and endpoint anymore as I earlier suggested and it will make things easier. For the InAndOutAnalysis the hierarchy will look like the following:
The end-user will use the The API in IDataProviderDescriptor create(String typeID, ITmfTrace trace, Map<String, Object> parameters);
void ITmfDataProviderSource.remove(ITmfTrace trace, IDataProviderDescriptor derivedDescriptor); WDYT? |
Yes. This definitely sounds like a nicer solution. |
One thing I would say here is that in effect, we are introducing a notion of sub-dataproviders, which is slightly different than derived data providers. We may need to handle it accordingly. |
363c5d0
to
f4bc9b9
Compare
Note: - Used through trace server - Multiple instances allowed - configuration is applied to all traces in experiment - configuration is not applied to experiment - Eclipse is using legacy InAndOut analysis config which allows one configuration per trace only and which integrated in the Eclipse UI [Added] InAndOut Analysis using ITmfDataProviderConfigurator extension Signed-off-by: Bernd Hufmann <[email protected]>
f4bc9b9
to
6aab2cb
Compare
794ca85
into
eclipse-tracecompass-incubator:master
Update (Oct 28, 0204):
All the PRs below as well as relevant Trace Compass mainline are merged. This PR only has the changes of the InAndOut implementation left.
This PR includes the TSP updates and concrete implementation for customizable analysis. The InAndOutAnalysis has been updated for that. This PR combines multiple commits, where some of them are available as separate PRs. It requires the commits of the following Trace Comapss mainline PR:
eclipse-tracecompass/org.eclipse.tracecompass#164
This PR combines the following commits/PRs:
Additional related changes:
Signed-off-by: Bernd Hufmann [email protected]