-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High CPU usage and Will takes 10+ seconds to respond on Slack #384
Comments
It's not immediately clear to me how the context of a command is used by default so I set the analysis engine to Nothing and willnow responds to every single command instantly. I'll continue observing how will behaves and report back. |
@Ashex Thank you for posting this. Hit the exact same issue this morning. Now, what's interesting is this behaviour is not exhibited locally on my development machine - If I launch it and let it connect to Slack, it never hits this high cpu usage. However, when I deploy on the actual server, it hits 100% immediately after the first message like you describe. If I get time I'd like to run a profiler against this section and see what on earth it's doing. |
@dawn-minion Good to know I'm not alone! I have the same situation where there's no cpu consumption issue happening on my laptop (OSX High Sierra), only in production. Post back with the results of running a profiler (I've never done it so wouldn't know where to start) against will! |
@Ashex How strange it works fine locally for you as well. Ubuntu 18.04 locally and 16.04 on the server. Hopefully will have time today to run that, and will post results here too. Thanks again =) |
@Ashex So, after a good amount of debugging, I've narrowed it down to this call here - Disabling this line (and the rest of the logic) makes it so there's no abnormal CPU usage. Reenabling only this line in the plugin makes the 100% CPU lock return. Just for fun, I tried dumping the contents here, and it was enormous - I don't have an exact number of entries, but, it seemed to go on for some time. Looking at the logic, I don't see any limit on how many messages it stores either. To test if this was the cause, I cleared the entries via |
Interesting, I knew it was something in there and only had a hunch about the messages. It's a bit weird as according to the code it already slices the list down to 19 elements before appending one. The thing that bothers me about this is there doesn't appear to be anything that is segmenting the channels so you potentially just have a long list of messages from different channels. |
@Ashex Looking at the code though, doesn't it simply chop it to 19 elements for the context object, but the size in Redis/storage is unlimited? Since only the context object seems to be affected here, and the original history object remains the same. If that really comes from all the channels, that would be a truly huge amount. Really should be limited to wherever it came from like you say. |
I'm crawling through things and from what I see the analysis backend is called via the pubsub backend which is triggered for each incoming message. Per the doc on the history analysis backend it's capturing the last 20 messages so that they can be used for contextual analysis of a command, therefore it needs to be recording messages from every channel will is a member of. This is a marginally educated guess as I haven't a clue how the hipchat backend functions but we're using the slack RTM feed so will is actually processing every message in the slack workspace which could mean those are all being added to And you're absolutely right about the history object, I completely missed that. I don't really see where else Seems like sharding by channel within the dictionary would help as we could then chop the channels message history to 19 anytime a message from that channel came in. As it stands |
Would it be better to switch to the slack events API? That would drastically reduce the amount of data will has to process. |
I've actually been dealing with this issue for a few months and haven't gotten to the bottom of it. Will is continuously consuming 100% CPU on a single core and has continued to do so despite allocating more CPU to it (upgraded aws instance type from m4.medium to c4.4xlarge).
I've also been dealing with an issue that might be related where Will takes 10 seconds or more to respond to a command and seems to only get worse until it's restarted.
I would love to debug and investigate this further but it's not clear how to do so.
Edit: I tweaked the logging for the execution of _watch_rtm_thread to get the process id and that is not the thread causing the high cpu usage.
Edit Edit: Added the same logging to everything using the Process command and have observed that the Analysis Thread (in
bootstrap_analysis
) consumes 100% CPU when a command is issued.The text was updated successfully, but these errors were encountered: