Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remaining problems with LowerTransportLayer.sendBlockAck #535

Open
daretobeorjan opened this issue Oct 3, 2022 · 2 comments
Open

Remaining problems with LowerTransportLayer.sendBlockAck #535

daretobeorjan opened this issue Oct 3, 2022 · 2 comments
Labels
bug Something isn't working

Comments

@daretobeorjan
Copy link
Contributor

daretobeorjan commented Oct 3, 2022

It seems the previous attempts to resolve this issue didn't completely help.

We've now seen multiple crashes when the mUpperTransportLayerCallbacks.getNode call returns null, causing an unhandled NPE when calling incrementSequenceNumber.

Unfortunately, I still can't really produce a small sample that reproduces the problem, I think our meshes are usually rather crowded with lots of messages flying back and forth that triggers it.

However, would it be possible to do some kind of workaround that would at least prevent the crash? Since it is run in a separate thread, there is no way for us to catch the exception, so our app just crashes. The easiest would just to be a try/catch and not send the ack, but I'm not really sure what kind of repercussions that would have on the functionality if the mesh.

@roshanrajaratnam
Copy link
Member

roshanrajaratnam commented Oct 5, 2022

@daretobeorjan Currently I am on paternity leave but i'll try to help when I have some time.

I remember this edge case being reported some time ago. How is your network setup? Do you have more than one provisioner? if so are all provisioners aware of all the nodes in the network?

Edit:

However, would it be possible to do some kind of workaround that would at least prevent the crash? Since it is run in a separate thread, there is no way for us to catch the exception, so our app just crashes. The easiest would just to be a try/catch and not send the ack, but I'm not really sure what kind of repercussions that would have on the functionality if the mesh.

Not sending an ack in time would repeat the original message a number of times depending on the mesh application layer implementation. This would create unnecessary traffic.

@daretobeorjan
Copy link
Contributor Author

I remember this edge case being reported some time ago. How is your network setup? Do you have more than one provisioner? if so are all provisioners aware of all the nodes in the network?

We do have multiple provisioners, most of the time. The most common trigger for us is having two phones online at the same time, when provisioning a new device on one phone, the other almost always crashes. But I have also reproduced this with only one phone and provisioner, so it is not strictly limited to that situation.

Edit:

However, would it be possible to do some kind of workaround that would at least prevent the crash? Since it is run in a separate thread, there is no way for us to catch the exception, so our app just crashes. The easiest would just to be a try/catch and not send the ack, but I'm not really sure what kind of repercussions that would have on the functionality if the mesh.

Not sending an ack in time would repeat the original message a number of times depending on the mesh application layer implementation. This would create unnecessary traffic.

Well, yes, but the app crashing completely with no way to catch the exception isn't much better. :)

@philips77 philips77 added the bug Something isn't working label Oct 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants