-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ConfigManagingActor
raising unhandled exceptions when file doesn't exist
#1116
Fix ConfigManagingActor
raising unhandled exceptions when file doesn't exist
#1116
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, good catch!
I think it would be good to add a test case for this, as it is not trivial to see why DELETE is handled separately,, but maybe we need to introduce a integration test, not sure if it would be easy to do with mocking.
Just found few more problems and fixed. I am still thinking how to fix @llucax comments and will continue working on it today. |
89f2857
to
5f3413c
Compare
@ela-kotulska-frequenz let me know what do you think about completely ignoring |
I agree! :) Will do proper changes. |
5f3413c
to
91d7328
Compare
|
3ce7400
to
26ce319
Compare
I think so, I think a complete and permanent deletion of the config file should be an error, and when facing that error I think the best option is to continue with the old config, like we do with other errors. I don't see any valid reason to remove a config file completely and permanently. If the file is created again we should be notified and pick up the changes. If we are not notified about the re-creation, then we have a problem, but maybe we can test that (if it is not tested already). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Damn, this got really tricky. Sorry for not being more clear about removing DELETE
from the events we watch. I think doing that could simplify the code even further.
except FileNotFoundError: | ||
# It is ok for config file to don't exist. | ||
_logger.info( | ||
"%s: Config file %s not found. Ignoring it.", self, config_path | ||
) | ||
error_count += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm, interesting. Why not an error? I also wonder if we shouldn't do this actually for any exception raised while reading, parsing and validating the config files. The goal of this is to be as resilient as possible in case of a partial failure (we can only read part of the config. And it is really tricky, sometimes I think if one config file read failed, we should probably skip the update completely for all files, as we might get into an inconsistent config.
I'm starting to feel maybe it would be better to restart the actor if there is an issue reading any of the requested files (i.e. requiring users to create at least and empty config for the files that are requested in the constructor).
But maybe this is a topic for a different PR. But I would make this a _logger.error()
in this PR. And maybe catch all exceptions, not just ValueError
and FileNotFound
(at least all OSError
, not just FileNotFound
as well as any errors that could be railed by tomlib.load()
, maybe it only raises ValueError
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently it works like this:
Every time any config file change - we read all config files. But not all config files might exists and this cover this case.
Luca, this PR was to fix existing bugs, not redesign ConfigManger. I guess it needs redesign and rethinking but maybe in separate PR and after we agree how it should work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here I log info, if any config file is doesn't exist.
And (code 3 lines below) raises ValueError if no config file exist. (at least one must exist)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Luca, this PR was to fix existing bugs, not redesign ConfigManger. I guess it needs redesign and rethinking but maybe in separate PR and after we agree how it should work.
Yes, 100% agreed, this is also why I said re-evaluating what to do on a failure to read a config file is for another PR.
But catching other types of errors is still a bug that IMHO is in scope for this PR. Since you are catching a new FileNotFound
why not just catching all OSError
s?
So here I log info, if any config file is doesn't exist.
And (code 3 lines below) raises ValueError if no config file exist. (at least one must exist)
Yeah, but that's only if we can't read any config file, if only one failed, no ValueError
will be raised, and the missing file won't be noticed if we are only monitoring WARNING and up. Also it is more consistent to how we handle the TOML parsing ValueError
. We log an error if TOML parsing fails. I don't understand why the inconsistency of making one issue emit an error log and the new one just a info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok the whole discussion was because I had impression that having not all config files is not an error.
But it looks like it is unexpected so as you suggested I changed severity of this logging to error. :)
Sorry for my misunderstanding!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it is OK, originally it was thought like that, only one config file should be enough, but I think that's assuming too much, and is hard to know if missing files are errors or expected, so I opened the new issue to change that in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see in the last commit you are changing this. I would not address the new issue (#1120) in this PR, I think it needs a bit more thought and it is not strictly related to the bug fixes in this PR.
26ce319
to
ceef54b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made some completely new comments to make it more clear what should be addressed in this PR, the rest if for a future PR, I already created an issue for that.
Besides the new comment, for me it is still not clear why it is better to create the file watcher in the constructor and not the _run()
.
ceef54b
to
9e558c0
Compare
Becasuse: * not all files in self._config_paths might exists * catching the watchfiles signal and file reading happen in separate async thread and the file could be removed between signal detection and file reading. This prevents unexpected actor crash. Signed-off-by: Elzbieta Kotulska <[email protected]>
`event.path.samefile` raises error if: * `event.pathq` doesn't exists * path `p` doesn't exists In this case it raised exception: * if any file in parent directory was deleted * any path in self._config_paths didn't exist Signed-off-by: Elzbieta Kotulska <[email protected]>
Signed-off-by: Elzbieta Kotulska <[email protected]>
Signed-off-by: Elzbieta Kotulska <[email protected]>
791e369
to
5acf147
Compare
Eliminate recursive actor crashes when config files are missing. The previous implementation suffered from a critical error pattern: - The `_read_config` method is called at the start of the `_run` method and on FileWatcher event. - When no config files existed, an exception would be raised - This exception caused the actor to crash and immediately restart - Restarting triggered the same `_read_config` method - The cycle repeated, creating a persistent crash loop This fix introduces a more robust approach: - Detect missing config files without throwing exceptions - Set up a FileWatcher to monitor for future config file creation - call `_read_config` method as soon as any config file is crated. Signed-off-by: Elzbieta Kotulska <[email protected]>
5acf147
to
d8655e3
Compare
Adding label for breaking changes and assigning to rc1400 as this now breaks the API, but we can do a quick release after it is merged anyway. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Approving.
I'm still undecided if it wouldn't be better to remove the last commit and leave that to be fixed separate after some more thought. But I think it is OK to move forward as it is too and reconsider in the future when #1120 is addressed.
This was detected when I edited config file with
vim
andconfig.toml.swp
file was created and deleted.