Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zwave Controller won't enter exclusion mode, resets instead #4380

Closed
apella12 opened this issue Sep 15, 2024 · 50 comments · Fixed by openhab/openhab-webui#2775
Closed

Zwave Controller won't enter exclusion mode, resets instead #4380

apella12 opened this issue Sep 15, 2024 · 50 comments · Fixed by openhab/openhab-webui#2775
Labels
bug An unexpected problem or unintended behavior of the Core

Comments

@apella12
Copy link

Expected Behavior

Zwave controller should send the exclude node command and wait for response. From the attached logs, this happens correctly on 4.1.3. The last section of the attached document shows the relevant section of the controller configuration update in the Zwave binding.

Zwave Exclusion.txt

Current Behavior

The controller resets and the exclusion mode is not entered. This issue was posted on the Zwave repository, but from the attached file, the problem originates in the core. It appears to start with OH 4.2.0.M1. The logs are with both the core and Zwave in debug mode

Possible Solution

From the data it appears core.thing.internal.ThingManagerImpl initiates the controller reset that causes the exclusion command to not be executed

Steps to Reproduce (for Bugs)

  1. Have OH running with Zwave controller online
  2. From the UI page click exclude devices

Context

Just trying to help

Your Environment

I first duplicated this problem (not the original poster) with OH4.3M1 in docker on Rpi3b with aeotec Zstick. For testing I spun up OH4.1.3 and OH 4.2.0.M1 with the exact same zwave network and controller

@apella12 apella12 added the bug An unexpected problem or unintended behavior of the Core label Sep 15, 2024
@mhilbush
Copy link
Contributor

@apella12 Thanks for logging this issue. I literally ran into the problem yesterday when I tried to exclude a node from my network.

@mhilbush
Copy link
Contributor

It seems a little odd that you correctly enter inclusion mode, but not exclusion mode.

@mhilbush
Copy link
Contributor

Well, I guess it might be because you enter inclusion mode differently (by starting a scan for zwave things) versus the way you enter exclusion mode (from the serial controller thing configuration page).

@rkoshak
Copy link

rkoshak commented Sep 16, 2024

I think this issue belongs on the zwave binding repo and not core.

@mhilbush
Copy link
Contributor

But the binding hasn't changed to the best of my knowledge. I suppose it could be a binding issue if there was a change in core that required a change to the binding.

@cdjackson @J-N-K WDYT?

@mhilbush
Copy link
Contributor

If you look at the file that @apella12 attached to this issue, there's clearly something different happening in core compared to the version that works.

@rkoshak
Copy link

rkoshak commented Sep 16, 2024

But if there was a change in core and the Zwave binding is the only binding that is having problems, should tghe Zwave binding be where the fix is implemented to adjust to the change in core?

@mhilbush
Copy link
Contributor

Yes, maybe. Or it could be that the implementation of the change in core didn't take the zwave binding into account. I really don't know one way or the other. But my opinion is that we should first try to find the change in core that broke the binding (assuming there wasn't a binding change that caused the problem). I did look through the zwave binding commits since late 2023 and I didn't see anything obvious that could've caused the problem.

@apella12
Copy link
Author

apella12 commented Sep 16, 2024

If you key off of the location where the message Controller Configuration update received occurs, in the working version it near the top. In the non-working version, it is after the core changes have already caused the controller to reinitialize (the code in the ZW binding is @Override, so is going to happen after the core code (if I understand correctly).

Also, of note in the non-working version is that the reinitialization of the controller seems to be triggered by the core not liking the channels, not the device parameters. From the Zwave code in the document, only a few parameter changes to the controller require reinitialization and the exclude devices parameter is time sensitive, so is meant to be executed without a controller reinitialization.

If the new OH core standard is that ANY change to the controller (either config parameter or channel) is going to trigger reinitialization, the Exclude device function can't be a controller parameter in the ZW binding. What I think is needed from the OH core is whether the reinitialization for any device change was intended and maybe a discussion of what problem that change was trying resolve. I think the discussion of where the fix needs to occur should depend on those comments.

@mhilbush
Copy link
Contributor

mhilbush commented Sep 16, 2024

Also, of note in the non-working version is that the reinitialization of the controller seems to be triggered by the core not liking the channels, not the device parameters.

Yeah that's the part i find confusing. It's complaining about the Config Description URI for channels (e.g. serial_sof), not config parameters.

@mhilbush
Copy link
Contributor

@apella12 Do you have a feel for the date (or even a date range) when this stopped working correctly?

@apella12
Copy link
Author

Only as noted in the document, 4.1.3 ok, 4.2.0M1 not. So basically, between the 4.1 release and the first 4.2 milestone (excluding any 4.2 M1 changes that were backported to the 4.1.3 patch release).

@mhilbush
Copy link
Contributor

Haha I was trying to remember when the 4.1 release came out. December 2023?

@apella12
Copy link
Author

apella12 commented Sep 16, 2024

Yes December 2023 is my recollection too. I don't think 4.2 M1 was out until late Feb, early March 2024

@mhilbush
Copy link
Contributor

@apella12 I don't know if it's related, but node heal and node reinitialize don't work either.

@digitaldan
Copy link
Contributor

Hi guys , sorry , just reading this on my mobile, so apologies if I'm off the mark ( and for my thumb typing ). This sounds like a bug in the MainUi i fixed a couple weeks ago. See openhab/openhab-webui#2704

And if that’s not the issue, please ignore me.

@mhilbush
Copy link
Contributor

@digitaldan Thanks. Looks like your fix was merged on Aug 25. I'm running a snapshot build from Sep 3, so I believe it should include your fix. However the problem still occurs.

I can try installing the latest snapshot just to confirm for sure, but I can't do that for a few days.

@apella12
Copy link
Author

@digitaldan After reading openhab/openhab-webui#2704 last night I was ready to write a long "thank you" as that described the situation exactly. However, I tried snapshot 4286 and the issue persists. The config change initiates a controller reset instead of just changing the specific "exclude devices" parameter, despite the @Override in the Zwave code.

    @Override
    public void handleConfigurationUpdate(Map<String, Object> configurationParameters)
            throws ConfigValidationException {
        logger.debug("Controller Configuration update received");

        // Perform checking on the configuration
        validateConfigurationParameters(configurationParameters);

        boolean reinitialise = false;

@mhilbush I tested the network Heal on the controller and that worked okay. That is set by a timer, so even if the controller resets it should still occur. However, you are correct that on a node UI neither "reinitialize device" or "heal device" works. As in the issue above, the first command is dispose().
2024-09-27 09:31:07.706 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 5: Handler disposed. Unregistering listener.

@mhilbush
Copy link
Contributor

@apella12 on the node heal and node reinit issue, I confirmed the UI has nothing to do with the problem. I invoked both heal and reinit by sending a config command using the REST API.

@digitaldan
Copy link
Contributor

, I confirmed the UI has nothing to do with the problem. I invoked both heal and reinit by sending a config command using the REST API.

So sending the via the REST api worked ? But using the UI does not ? Sounds like a UI problem unless i am not understanding your statement above ☝️

@apella12
Copy link
Author

I'm confused as well. Can you update the controller to "exclude devices" true, with the REST API?

I'm still stuck on the idea that the order of events is now dispose(), Initialize(), then apply the config, but all these need to apply the config first and not do either dispose() or initialize() unless the binding specifies it.

@mhilbush
Copy link
Contributor

Sorry for not being clear. The problem still occurs when I use the REST API to invoke the node heal and node reinit commands.

@mhilbush
Copy link
Contributor

I'm confused as well. Can you update the controller to "exclude devices" true, with the REST API?

Yes you can do that through the REST API using the config endpoint. If you use developer mode in your browser you can see what the UI sends, then use curl to replicate the POST command.

@brukri
Copy link

brukri commented Sep 29, 2024

Same happens to me when I navigate to the Z-Wave primary controller and click on Exclude Devices. The controller goes offline and online again without entering the exclusion mode.

So I did an investigation on possible workarounds.

I finally manged to put the Z-Wave binding into exclusion mode by executing an config update via REST API on my zwave primary controller thing. Have a look at the screen shot. The configuration payload can be taken from the rest action that is executed by the UI when the Exclude Devices button is pressed.

Important is to set the property "controller_exclude" to true

image

After that I could see the following related log messages:

2024-09-29 22:56:27.214 [DEBUG] [zwave.handler.ZWaveControllerHandler] - Controller Configuration update controller_exclude to true
2024-09-29 22:56:27.216 [DEBUG] [al.protocol.ZWaveInclusionController] - ZWave controller start exclusion
2024-09-29 22:56:37.149 [DEBUG] [al.protocol.ZWaveInclusionController] - ZWave controller end exclusion

Looks like this workaround works fine since I already excluded two nodes successfully.

@mhilbush
Copy link
Contributor

mhilbush commented Sep 30, 2024

@apella12 @digitaldan It looks like Main UI is invoking the thing update REST endpoint (i.e. /rest/things/zwave:serial_zstick:zstick) when you select "Exclude Devices" on the serial controller thing configuration page. That might explain why the overridden handleConfigurationUpdate method is not being called, and why the thing is reinitializing.

In the past, I thought Main UI called the thing config REST endpoint (i.e. /rest/things/zwave:serial_zstick:zstick/config). But my recollection might not be as clear as I would like.

@apella12 Do you still have the working version installed, and can you check which REST endpoint is being invoked when you select "Exclude Devices"? The easiest way to do that is using the browser dev tools where you can see what REST endpoints Main UI is invoking.

@cdjackson Can you weigh in on this?

@mhilbush
Copy link
Contributor

And, I guess that explains why what @brukri did works. He's calling the config update endpoint, not the thing update endpoint.

@cdjackson
Copy link
Contributor

I've not read through all of this, but yes, the exclude devices option is a configuration setting, so I'd expect the UI to call the config endpoint. Unless something changed in the UI, I don't think anything has changed in the way this works in the binding or configuration for a very long time.

I don't know why the UI would call the thing update here - is there some logic in the UI that decides to call the thing update instead of config when the config changes?

@mhilbush
Copy link
Contributor

mhilbush commented Sep 30, 2024

the exclude devices option is a configuration setting, so I'd expect the UI to call the config endpoint

Yeah that's what I thought. And i'm pretty sure it did call the config endpoint, so something must've changed in the UI.

I don't think anything has changed in the way this works in the binding or configuration for a very long time.

Agreed. I looked through the zwave commits and this hasn't changed

@apella12 I think the issue with node heal and node reinit might be different than the issue with exclusion mode. I'll try to explain if I can learn a little more about it.

Edit: I take back what I said about the node heal and node reinit possibly being a different issue. If I'm right about what I said below, in the case of running the node heal or node reinit, the thing will be updated rather than the thing config, which I believe will cause the node heal or node reinit config parameters to actually be changed to true. So, next time you invoke node heal or node reinit for the same node, the config change will be ignored because the binding will think that nothing has changed.

https://github.com/openhab/org.openhab.binding.zwave/blob/main/src/main/java/org/openhab/binding/zwave/handler/ZWaveControllerHandler.java#L306

And, this is in fact the message you see in the log file if you call the config endpoint directly rather than from the UI.

2024-09-30 09:26:35.156 [DEBUG] [rg.openhab.binding.zwave.handler.ZWaveThingHandler] - NODE 149: Configuration update received
2024-09-30 09:26:35.156 [DEBUG] [rg.openhab.binding.zwave.handler.ZWaveThingHandler] - NODE 149: Configuration update ignored action_heal to true (Boolean)
2024-09-30 09:26:35.156 [DEBUG] [rg.openhab.binding.zwave.handler.ZWaveThingHandler] - NODE 149: Configuration update ignored node_id to 149 (BigDecimal)

@mhilbush
Copy link
Contributor

Hmm. This change to webui seemed to happen around the timeframe when device exclusion would've stopped working. @apella12 WDYT?

openhab/openhab-webui@866d9e5#diff-82a2bcdda41989ee81ed1f512cebeea5bccfc5fdaf337c812a5d0eed5d3d44d0R510

@florian-h05 Do you think it's possible that this change might've resulted in the behavior we're seeing. Namely that the thing endpoint is being called for a config change instead of the thing config endpoint?

@mhilbush
Copy link
Contributor

Specifically, I'm wondering if the thing will be marked dirty here, even if it was just the config that changed. If so, then won't thing update endpoint be called even if just the config changed?

https://github.com/openhab/openhab-webui/pull/2329/files#diff-82a2bcdda41989ee81ed1f512cebeea5bccfc5fdaf337c812a5d0eed5d3d44d0R349

@apella12
Copy link
Author

apella12 commented Sep 30, 2024

Disclaimer: As my retirement hobby I have used that time to learn some java, so do not readily understand the Vue code.

I have looked at the PR by @digitaldan and don't see how that separates the Thing config endpoint but assume it does. I also looked through the history of the thing-details.vue including the PR you highlight. There were several PRs addressing "dirty checks" by @florian-h05 in the right timeframe before what I'm assuming was the "fix" by @digitaldan. Where I was stuck was why the "fix" did not work for Z-wave. From the "not working" log, my latest thought was that the Z-wave channels were somehow identified as "dirty". So we are on the same page.

Prior to seeing the latest activity this AM, I was thinking about modifying a Z-wave binding to take out the channels to test this idea. As @cdjackson notes nothing in the Z-wave binding has changed in these areas. I was thinking that there might have been changes in the core file core.thing.internal.ThingManagerImpl that affected the Z-wave channels that then cause them to be marked dirty. I know from another binding I'm working on that the new standard for channel naming is to use "-" not "_", but did not think that was retroactive.

EDIT: separately, but related to my ability to test. I only have the Z-wave binding running with 6 nodes in docker on an Rpi3b with a bare bones OH and for some reason I get a 500 server error when trying to use the thing/config REST API. I had tried what @brukri did a few days ago without success. So need to figure that out first. I have been alternating between docker copies with various OH versions, so switching is not a problem.

@mhilbush
Copy link
Contributor

so do not readily understand the Vue code

Haha me either.

But, I'm guessing that when you call fastDeepEqual on the thing object, it will return not equal even if it's just a thing config value that's different. However, I don't know that for a fact.

OTOH, that easily would explain why the thing is being updated when only a config value was changed.

@apella12
Copy link
Author

I also was stumped by the fastDeepEqual line, but assume it was tested.

@mhilbush
Copy link
Contributor

I also was stumped by the fastDeepEqual line, but assume it was tested.

It probably was tested. And in most situations, it would appear to work.

@mhilbush
Copy link
Contributor

mhilbush commented Sep 30, 2024

Ok, I see the config is deleted in the current version of the file. So why is the thing being updated if just the config is changed?

https://github.com/openhab/openhab-webui/blob/main/bundles/org.openhab.ui/web/src/pages/settings/things/thing-details.vue#L356

@mhilbush
Copy link
Contributor

@digitaldan Seems like your change should've fixed the problem. So maybe there's something else going on here?

@florian-h05
Copy link
Contributor

There is definitely something else going on here ... it seems that there is an issue with Vue reactivity, which leads to the UI not recognizing the change properly and setting configDirty.

@florian-h05
Copy link
Contributor

Okay I have identified the problem:

https://github.com/openhab/openhab-webui/blob/e742d7a13dd870b19bdafa2a4b1d9d21c4bb26ae/bundles/org.openhab.ui/web/src/pages/settings/things/thing-details.vue#L567

modifies the Thing in a way Vue reactivity does not recognize the Thing has changed. The change has to be done through $set.
But there is one more problem:
The button calls save right after modifying the Thing and Vue does not recognize the change fast enough.

florian-h05 added a commit to florian-h05/openhab-webui that referenced this issue Sep 30, 2024
A Thing config action is only supposed to save the config of the Thing, not the whole Thing.

Fixes openhab/openhab-core#4380.

Signed-off-by: Florian Hotze <[email protected]>
@florian-h05
Copy link
Contributor

florian-h05 commented Sep 30, 2024

I now have a fix: openhab/openhab-webui#2775

I have tested the fix on my dev system with the Z-Wave serial controller Thing, and now only the config is saved.
For some explanation what was going wrong, check the changes in the linked UI PR.

@digitaldan
Copy link
Contributor

Thats great news @florian-h05 !

@florian-h05
Copy link
Contributor

@digitaldan Would be great if you could give it a real-life test, as I have only tested with a controller Thing that is always offline (I have no controller).

@brukri
Copy link

brukri commented Sep 30, 2024

Thank you for sorting this out.

In openhab 3.2 the UI called the /things/{thingUID}/config REST endpoint for exclusion. See my comment on another Z-Wave exclusion problem from 2022.
https://community.openhab.org/t/how-to-exclude-a-z-wave-device-with-openhab-3/120127/21

@openhab-bot
Copy link
Collaborator

This issue has been mentioned on openHAB Community. There might be relevant details there:

https://community.openhab.org/t/z-wave-controller-goes-offline-when-executing-exclude-devices-from-the-controllers-thing-page/158209/6

@cdjackson
Copy link
Contributor

Prior to seeing the latest activity this AM, I was thinking about modifying a Z-wave binding to take out the channels to test this idea.

@apella12 I'm not really sure what you are getting at with this. What "channels" do you mean? If you are referring to "channels" for exclusion etc, then these are not channels in the OH definition - they are just configuration options. I don't think making them channels is a good option since you then need to attach an item to the channel in order to control the binding. Or maybe you mean something else when you talk about channels, or maybe I just misunderstand :)

In an ideal world, IMHO these shouldn't really be config parameters either, but back in the day when this was written, there was a discussion around how to do this, and the statement was that "other software" should be used to control this part of systems. I didn't like this, so the only option available at the time was to use configuration to trigger this functionality. Possibly there are better options, or maybe there's now more openness to adding more functionality to the core to improve this sort of integration...

@apella12
Copy link
Author

apella12 commented Oct 1, 2024

@cdjackson : In the attached file in very first post in the "non-working" case there was a lot of core activity on the channels that were not in the "working" case. By channels I'm referring to the serial_ack, serial_oof, serial_can, etc. My thought (last night) was that some new checking in the core was causing the problem. There had been little activity on this issue, and I was feeling I needed to try something.

Since this AM, it appears these channels are not the problem, and the log activity is because the controller was being reinitialized because it was deemed dirty by the webUI changes in early 2024, so I'm hopeful.

Separately I agree it would be cleaner to just have a set of Zwave commands that could be run directly. All this configuration overhead is just to run controller.requestRemoveNodesStart();

@florian-h05
Copy link
Contributor

@cdjackson

Possibly there are better options, or maybe there's now more openness to adding more functionality to the core to improve this sort of integration...

I would propose the use of Thing actions.
There is ongoing work to invoke them in the UI (both from UI-based rules and the Thing details page), this allows to invoke most Thing actions from the UI. There are a few limitations wrt to which data types can be used as action inputs, because these types need to be constructed from serialised values, but I guess the Z-Wave Thing actions won’t run in that limitation.

florian-h05 added a commit to openhab/openhab-webui that referenced this issue Oct 1, 2024
A Thing config action is only supposed to save the config of the Thing,
not the whole Thing.

Fixes openhab/openhab-core#4380.

Signed-off-by: Florian Hotze <[email protected]>
@florian-h05
Copy link
Contributor

@openhab/core-maintainers Can you please close this issue? I have just merged the fix in the UI.

@wborn wborn closed this as completed Oct 1, 2024
florian-h05 added a commit to openhab/openhab-webui that referenced this issue Oct 1, 2024
A Thing config action is only supposed to save the config of the Thing,
not the whole Thing.

Fixes openhab/openhab-core#4380.

Signed-off-by: Florian Hotze <[email protected]>
(cherry picked from commit d78e6d9)
@mhilbush
Copy link
Contributor

mhilbush commented Oct 5, 2024

@florian-h05 I just installed snapshot 4309 and I can confirm that the problem is resolved. Thank you!!

However... This is for anyone who tried to run exclusion while this problem existed.

It's possible/likely that controller_exclude in the Z-Wave Serial Controller thing now is set to true. So, even if you try to run exclusion it won't run because the Z-Wave serial thing believes that controller_exclude is already set to true (hence it will ignore the config change). I fixed it by stopping openHAB and hand-editing the Thing.json in jsondb to change the value of controller_exclude from true to false.

@apella12
Copy link
Author

apella12 commented Oct 5, 2024

You can also edit the controller UI Code tab to false. The controller will reinitialize, but you will then be able to use exclude devices from the Controller UI Thing tab. Note that ANY change on the Controller UI tab will reinitialize the controller.

@mhilbush
Copy link
Contributor

mhilbush commented Oct 6, 2024

I also had to fix action_heal and action_reinit on some of my nodes because those had been set to true as a result of this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An unexpected problem or unintended behavior of the Core
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants