-
-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better connection lost handling #33
Conversation
- Before this, if there were no internet connection on first connect, it would fail and did not retry. That use-case is now handled. - After a connection failure, channels were not re-subscribed. That is now fixed. - The "Ping" interval was triggered everytime we did tried to connect, even without success. So if we tried to connect 40x, we would have 40 "setInterval" triggered, and 40x ping/minute , instead of one ! That's a bit more production-ready now :)
@leob I think you should have a look at this ;) My WIFI ON/OFF switch had a hard time, being switched back-forth for a while ;) |
@Adesin-fr Thanks, will check it out ! |
@Adesin-fr Okay, I've merged this PR and published a new version to npmjs. Then I went on to create a new build of our app, and deploy and test it - but unfortunately it looks like the web sockets functionality in our app is more or less broken now, I'm not seeing any "notifications" (messages) being delivered over our private channels anymore - nothing is coming through ... Does it work for you? What I noticed is that I seem to be seeing fewer log messages than I would expect - here's a partial screenshot of my JS console: I do see: "just set socketId ..." but I'm not seeing: "Trying to connect..." Not saying that I should be seeing all of these, but shouldn't I be seeing at least some of them? Can you debug this a bit? From the testing that I did with our app this seems to have more or less fallen apart now - maybe the solution turns out to be something very simple, or even something which we need to change in our app, but right now I don't really have a clue (but I admit that I didn't really debug it yet either). (by the way - the only way that I know of to debug this stuff is to go through a full AWS/lamba/app deployment, which makes iterating/debugging this really slow and painful ... not sure if there's a "better" way, I guess not) I also wonder if our "Websocket" implementation needs this much "specific" code, while all we really do is piggyback on top of the standard Laravel Echo protocol - we might be reinventing the wheel a bit here, although I also don't know if there's a better way. P.S. if you want to work on this, please pull master first - I've cleaned up the code a little bit by adding semicolons where they were missing - the JS code in this project was consistently using semicolons, but we now ended up with a mix of "semicolon" and "no semicolon" - I've added the semicolons where they were missing, please let's keep it consistent ... Another thing I did was to use a "logging prefix" for the |
First, thanks for having merged this. That's strange since I don't think it should not break private channels ... To see "trying to connect" messages and others , you must enable debug mode on the Echo instruction (remember the last dockbloc of the readme ?) I'll have a look at this with private channels later today :) PS : have you checked the auth requests on your laravel app ? They must succeed and give back an auth key, that is then used for the websocket things... |
@Adesin-fr Yes, I did enable the debug option, otherwise I wouldn't have seen the messages which I saw ... so, I do see some messages, but I'm not seeing all of the messages that I would expect to see :) Thanks for looking at it! If you can't find anything then let me know, and I'll put a bit more effort into debugging it ... |
@Adesin-fr P.S. please don't forget to pull By the way, I still see some Github jobs failing, e.g. one of the Jest tests - don't know whether that's relevant or not ... but for now I'd say the failing Github jobs are low priority :) |
I've tried to have a look. In my project, I'm seeing the "trying to connect messages and every others. |
@Adesin-fr I've merged your new PR to bring Axios back - I saw that orginally the Axios import wasn't there, then you added it in your previous PR, and I then deleted it again - but I don't know why I deleted it ... no, actually I do know why now - it was because of a build error which I was getting - see comment in the other PR ... But okay, I don't know why you do see those log messages, while I don't see them! I'm going to debug the problem, I think I'm going to add more log statements, including in the PHP code, so that I can see exactly what's going on ... probably not today, but maybe tomorrow - work in progress! P.S. we do have "echo" in our app's
but it's a different (slightly newer) version than the one required by |
@Adesin-fr I was trying to debug my websocket issues - what was just baffling was that when I opened our app in two browsers (Chrome and Firefox) and logged in in both browser windows, I could send a message (via websockets) from "Chrome" to "Firefox", but not the other way around (from "Firefox" to "Chrome") ! Debugging it was an ordeal (I had a hard time to even log anything to CloudWatch), but what I did see was this - a "410 Gone" exception from API Gateway:
So, sending the message to the first connection worked, but sending it to the second connection failed with a "410 Gone" ... And that gave me an idea - "let's look in the Connections table in DynamoDB" - and bingo! When I cleared that table, things started to work ... Pretty clear that there are sometimes "stale" connections left behind in that table - which is strange, because we DO have this piece of code in the
So the idea is that in case of an HTTP 410 Gone" it SHOULD remove that connection from the DynamoDB table ... strange that that doesn't always seem to work. This still looks like a potential weakness - the fact that the Connections table can contain "stale" connections ... I mean, in theory "it should just work", but it feels somewhat fragile (potentially). P.S. I was also struggling with getting any logging to work - I cloned the repo (
and then with:
but it seemed to just refuse to pick up the dependency from my local folder - I did NOT get to see any of the logging that I added in the PHP code - somehow it did not seem to use the "local" version, but the version from Github ... ? And to top it all off I had issues with the AWS S3 file adapter because of the upgraded
Apparently that requires a PHP fix/upgrade: |
Well I did not yet experienced the 410 errors ! We'll surely need to address it soon ! |
@Adesin-fr I've seen these "410" errors more than once in our app, but today I finally found out what causes it: "stale" connections in the DynamoDB "Connections" table! That should never happen, but apparently it did ... clearing the table solved the issue, but if it happens again then we should indeed address it. |
@Adesin-fr Look at this change from another PR (#25): Maybe this change makes it more robust - checking for
instead of ONLY for This could be a reason why in some cases the "stale" connection isn't getting cleared ... I've added this to the latest release. |
@Adesin-fr So, after upgrading my app to the latest version of But, it was easy to fix - I had to remove this line from my frontend app's initialization code:
and instead I had to pass the
Works now! And we're no longer using |
There are some issues with the current version :
That's a bit more production-ready now :)