-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gNMI.Set - Config Diff Extension #179
Comments
Can you please elaborate on a few things:
I have a concern that the proposal may not be aligned with the following statement in the spec:
|
Why not do a gnmi.Get and compare that to your planned config? ygot.Diff provides functionality to compare two OC structs. |
Ah, yea, I think the device datastore is an important piece here. The running config may have been created using any combination of CLI, OC, native model, etc. A candidate config is also likely a combination of OC, native model, or even CLI via a union_replace. So from the start we know we are likely not comparing OC to OC or even YANG to YANG. I think from experience we've seen any time we need the device to convert between config types (i.e. CLI -> OC) we introduce the risk of bugs. For this reason, I'd be hesitant to suggest we'd want to do something such as adding an extra step to normalize the entirety of both configs to something common like OC or CLI in an effort to perform this config diff. The ultimate goal here is to ensure the candidate config will result in zero differences to the running config. I think the only source of truth at that point is the device's underlying datastore. I'd suspect what we'd really be asking to compare is the device's running config (as defined by its lowest layer datastore) against the candidate config (as defined by how the device would convert it into the lowest layer datastore). In the case of a diff, that would result in output that perhaps won't be as readable or obvious as a bunch of clean OC paths, but that might be the tradeoff to guarantee and truly give the confidence that the diff is accurate. There shouldn't be any requirements for the output to be machine readable though. Regarding the RPC, I'm not tied to using an extension, it just seemed like an option that would be on the less intrusive side, but if a new RPC is preferred (or even a new gNOI service) I'm open to that. We should give some consideration to the need to have something that matches 1:1 with the SetRequest and wouldn't deviate over time, so that might sway the choice one way or another. |
I'm wondering if you can provide an example of how you plan to use this. If we assume that the device has this extension, and you send a gnmi.set with an update (or replace) OC request
Still, what would be the format of the diff? Are we saying it can be vendor-specific and defined by the vendor? |
Sure, I'll try to explain the use case better and how I'd think operators would use it: =========================================================== Work needed: We'd like to migrate featureA from CLI -> OC. Challenge: Sending a new config to an in-service device is really risky. The migration of featureA to use OC must be a no-op, but guaranteeing this is the challenge. Migration work: First run a "config-diff" by providing the candidate config that sends both featureA and featureB using OC only.
=========================================================== I think your points above highlight why I'd be nervous about the diff happening at any layer other than the lowest layer database in a device. Guaranteeing behavior between whatever higher layer config databases (OC, CLI, etc.) seems problematic as you highlighted. =========================================================== As for the format of the diff, I see it really as just vendor-specific, referencing the lowest layer database. I'd hope the output could be minimally human readable, but the action taken from a diff is probably that dev's on the operator's side need to investigate the config gen. That investigation likely would also involve vendor dev's as well. But the diff's should not have a need for anything fancy or machine readable. |
How does that You can, potentially, find a way to do this via union replace2. But even then, you'll face more challenges: not all config is actually specified via gNMI, according to the bootz specification and configuration namespaces defined there The comlete "running config" is merged from many sources
So now cli Footnotes |
There isn't anything special with the SetRequest here. I also don't view it as enabling/disabling...just doing it with a different mechanism. For example, trying to take the simplest config knob I can think of, the initial config generator may be: Over SSH and via CLI: config hostname lx01wr01.sfo01 Then when we are ready to migrate this config to OC, we could re-work the config generator and instead start doing this going forward (if deemed safe): Over gRPC and via gNMI.Set(): replace: { =========================================================== To some of your other questions:
|
The problem with multiple sources is that you don't know which source contributed the value to the running config. If we stick with your example, and you start with and then send a gnmi set/replace + diff
you can receive a response that the final config is identical, and the change is no-op. But it doesn't guarantee that the first config statement can be replaced with the second. If we modify the OC request to
You would expect to see a diff, but, depending on the specific implementation and configuration approach, it's not guaranteed. You can receive the "zero changes" response and think it is a no-op because the previous config lives in another datastore with higher precedence (e.g. bootconfig). And only when you remove it from the other source, the change will be visible. In both scenarios, you can also receive a grpc error on gnmi set (with or without diff extension). If an implementation uses separate data stores for cli+oc and does not allow overlaps, that would be the result of trying to overwrite the value defined in another data store. |
I think the underlying assumption here is that there is a datastore that each device maintains at the lowest layer (i.e. the datastore that has the highest precedence). This would be where the diff happens. I should note, I did speak with one vendor who indicated this to be the case and seemed like supporting a feature like this would be relatively easy. NETCONF apparently offers a similar feature, although I don't have much knowledge of it, but because of that this vendor seems to have looked into doing this already. That is an interesting scenario though where there might be a zero diff, but if some previous config persists in a datastore, you wouldn't really know its contribution to the final lowest layer datastore until it somehow got removed. The main goal is to know that the candidate config will be a zero diff operation, which it would still do though. There is now the risk that some future change doesn't behave as desired, but I think this could be acceptable (or at least dealt with using another approach). |
NETCONF doesn't offer this as a part of the protocol, but a recent RFC (rfc9144), indeed, introduced a similar (optional) feature, and prior to that, there were a couple of other initiatives. The major difference, however, is that netconf (/ietf nmda) uses explicit and well-defined datastores for all operations (rfc8342), including the This proposal, on the other hand, introduces a layer of abstractions that is likely to produce different results in different implementations and environments, all while possibly not solving the original problem.
Perhaps I don't understand the use case. My assumption was that in a config migration scenario (e.g. you want to replace cli
However, based on the conversation in this thread, it seems that you're ok to validate only the second part (config addition, which may coexist with the old config). Personally, I don't think this solves the problem that you described initially. |
I also think that gnmi.Set should focus on a single concern, namely to modify config in the simplest way as a parallel to Get. If we want to try a config, then maybe a separate call is better:
I haven't looked into NETCONF, but it does sound more well-defined. Somewhat off-topic, my other wishlist for gNMI that could have saved me a lot of headache during development is the ability to do config snapshots and auto-rollback if the config knocks the device offline for some reason. Being able to review the config diff during release qualification would also help greatly to catch programmer errors. Many vendors already support this in CLI, but it may require some thought how to design this in gRPC. I think we would at least need a few more methods to make that possible: Stage/Compare, Commit/Abort/Rollback. It adds operational complexity that it might be a better fit for a new service. Any opinions? @dplore @robshakir |
Let's keep this PR/issue focused on the core topic. We deliberately did not expose these semantics (long-lived candidates, rollback across different transactions) in gNMI. They raise many corner cases that happen when humans are experimenting/during development, but generally are overhead otherwise. I'm sympathetic to the problems described, but do not support adding significant complexity to gNMI. |
Regarding auto-rollback, there is a gnmi extension for commit (and confirm) which is for that purpose. |
It seems there is some consensus to avoid adding to gNMI through an extension. In talking with @robshakir it sounds like a good alternative might be a new gNOI service dedicated to this. I'll explore that. |
Is the plan to deprecate the gnmi_ext.proto and have this new gNOI subservice implement config staging, comparison and rollback? I agree that gnmi extension adds a lot of complexity to gnmi.Set semantics. On the other hand, if we split that to a separate service, then I think limiting the scope of that to just comparing might be a missed opportunity to streamline the config store lifecycle across all vendors. |
gNMI extensions provide more than this capability, so no, we will not deprecate gNMI extensions based on any decision in this issue. |
Notwithstanding gNMI extension, will this new service be a solution to config store lifecycle? Limiting the scope to config comparison would be a missed opportunity to catch up to NETCONF. |
The proposed scope of this issue / requirement for Diff encompasses back end data stores which is a significant expansion of scope for gNMI and what is IMO the root issue preventing this from being a simple addition or extension of gNMI. We want to keep gNMI scoped to an external view of the configuration. One can compare a Get with a Set using ygot.Diff today which returns gNMI formatted notification messages. One could also get the vendor native yang and CLI config as well using CLI. The CLI config will just be an ascii blob which could only be a text based diff on the client side. better than nothing, but not a 'semantic' diff like we get with yang based schema. |
For the record, I don't necessarily object to adding a diff operation to gNMI (via an extension or by other means) But I do have a number of other concerns/questions about the proposal, such as
And these challenges will not disappear if this idea will be moved out to gNOI |
I'd like to propose a new gNMI extension to perform config diff on a device. Details are described below.
Objective
Define an approach to validating that a configuration candidate sent using gNMI.Set() would result in zero configuration differences if applied to a device. If differences are identified, then they would need to be reported back.
Background
In a production environment, migrating to OpenConfig based configuration from a CLI based configuration is a potentially risky activity. When a feature is migrated to OpenConfig, the migration should functionally be a no-op, however, there isn't a guarantee that the OpenConfig matches exactly with its CLI equivalent. The OpenConfig may be accepted and return success. However, a success could mean:
Proper testing would provide a level of confidence for these migrations, however, it puts a burden on the tests being complete and comprehensive. In order to definitively validate that a feature works identically, test cases are required that exhaustively cover all edge cases. Whilst this is a desirable end state - the shorter-term requirement is to validate that existing behaviors have not changed.
Utilizing a defense-in-depth approach, relying on an additional config diff would provide more confidence to migrate configuration.
Additionally, this would be helpful during development of new OpenConfig config generators as developers would have an easy mechanism to compare their code output against existing CLI based config generators.
Proposal
A new mode could be supported within a gNMI.Set() to have the device perform a diff of the provided configuration against the existing configuration. This would result in two outcomes:
In either case, a configuration change is NEVER allowed to be applied to a device in this mode.
Specifically, the proposal of the mechanism to achieve this is to add an "impact" extension to the gNMI.SetRequest. It is suggested to utilize the gNMI.Set() RPC because an important need is to use the exact details within the SetRequest message to define the candidate release. This ensures the configuration sent in the normal mode or in this new proposed mode are identical. The SetRequest message could be reused in another RPC, but that seems more complex than simply having a new impact extension within the existing gNMI.Set().
The text was updated successfully, but these errors were encountered: