-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate set schema for subsequent calls to SOS store #1532
Conversation
It is possible that a set could be provided to the SOS store() function that has a different schema than what was originally provided. This can crash the ldmsd and/or store incorrect data to the container. This change caches the set digest of the first set in the storage instance and compares this cached value to the set digest provided in the store() function. If these digests do not match, the set is ignored and not stored.
Hi @tom95858, how would this affect attempting to store a changed schema to an older schema even after restart? Additionally, is the best method to just drop the schema data rather than storing to a new schema that can then be used for recovery later? |
BTW: This change only applies to non-decomposition enabled storage. In the case you're referring too, it wouldn't work because the schema already exists in the container and has a different set of attributes. We need to create a new schema that matches the new/removed/renamed/re-typed attributes.
No it's not the best, but we would have to design up a set of policies:
So it would be great if we address the issues above and then write the necessary logic. But absent all that work, the bare minimum behavior is to not crash and at this point that requires dropping the data. |
@tom95858 The patch doesn't handle the case that For your information, storage policies cache set digests. A storage policy won't store the sets that have different digest values from the cached value. Please see ldmsd_strgp.c:662. |
@tom95858 Is it possible for applications to store some SOS schema properties in a SOS database (not the C object)? In this case, I'm talking about storing the set digest in the database. |
@nichamon it doesn't hurt to add your patch, so let's just do it and then we will have our bases covered. Oh and yes, we could store the set digest in the SOS schema, however, SOS is not intended only for LDMS. We also use it for baler. I guess we could add a user-context field of some sort to the schema. |
@nichamon, This code path is during lookup. The set has already been looked up, or it wouldn't be getting updated. The set returned, however is different, or it wouldn't hit your asserts. Whether or not the returned set's digest is different is another question. I don't think I understand how this is all happening exactly, but it is. I suppose it is possible that the problem has been misdiagnosed, but we should at least apply your patch to get more information. |
@tom95858 I can reproduce the assert(), and here is the scenario. I had a sampler daemon (node-1) and an aggregator running. The aggregator stored
On the aggregator node,
Here is a snippet of the I killed both sampler daemon and aggregator. Then, I started the sampler daemon with a modified configuration that changed the type of metric
Then I restarted the aggregator with the same configuration that stores With my patch in PR #1528, ldmsd reported the error messages of mismatched value types instead of the assertion. |
Adding a user-content field to SOS schema and used that field to compare with the digest cached in the store sos handle would cover the case I showed in my previous comment. |
@nichamon this is only for a type mismatch. If a new field gets added (which is the usual use case for us) you won't get this message. We need some behavior to catch not only if the schema changes while the daemon is running but also if the daemon restarts and there is a mismatch. |
I think we can decide in the future on what to do for a new schema creation but we need something where the daemon won't crash and will provide appropriate error messages. |
@bschwal Yes, I'm aware of adding new metrics case. There is a logic in Tom and I also discussed a path forward for comparing the set schema and the SOS schema in the database by storing the set digest or set SHA hash in the SOS database (#1532 (comment)). This will catch any differences between the SOS schema definition in the database and the set schema definition without going through the whole set. |
Hi @nichamon, are you saying this logic already exists? If so it does not work. store_sos does (and I've seen this behavior for at least a year now) crash ldmsd when the situation I described occurs. I like the idea of storing the set digest in the sos database and doing a check on that. |
@bschwal Let me clarify the current and planned behavior: Current behavior:
However, when new metrics are inserted between existing ones, the type mismatch check could trigger first, leading to a crash before the count difference can be detected and safely reported. Proposed solution:
Once #1528 and this PR #1532 are merged, you won't see these crashes anymore. |
Sounds great to me, thanks all! |
It is possible that a set could be provided to the SOS store() function that has a different schema than what was originally provided. This can crash the ldmsd and/or store incorrect data to the container.
This change caches the set digest of the first set in the storage instance and compares this cached value to the set digest provided in the store() function.
If these digests do not match, the set is ignored and not stored.
This change also removes "dead code" that was no longer used after the decomposition changes.