-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raw_exec plugin windows: "logmon: Unrecognized remote plugin message" #11939
Comments
Hey @cattuz This issue has come up a few times,but we haven't found a solid root cause yet. One of the recent users to encounter this with the same plugin noted:
Based on your # of allocations, do you think this is the bug you're experiencing? |
The allocation counts are as follows: My experience is that it has to do with how long nomad or the allocations have been running. There is no problem starting up around 40 allocations on a fresh VM, but when an allocation gets scheduled on the same VM after a few hours it fails with the logmon error. I'm running nomad as a windows service on the VM. Would using a scheduled task or something else be better? Edit: Edit 2: I will try increasing the subprocess limit and see if that changes anything. |
Indeed the windows "desktop heap" seems to have been the problem. More discussion in the service limits stackoverflow thread. After increasing service heap limit from # Increase service heap limit, controlling how many subprocesses a service can spawn
$heapLimits = "%SystemRoot%\system32\csrss.exe ObjectDirectory=\Windows SharedSection=1024,20480,4096 Windows=On SubSystemType=Windows ServerDll=basesrv,1 ServerDll=winsrv:UserServerDllInitialization,3 ServerDll=sxssrv,4 ProfileControl=Off MaxRequestThreads=16"
Set-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Control\Session Manager\SubSystems" -Name "Windows" -Value $heapLimits ...all the logmon issues have gone away. It does seem quite logical in hindsight, although I've never encountered those windows service limits before. There seem to be some caveats, like the change affecting all services running on the VM potentially making them consume more resources, but so far I have noted no adverse effects. |
Nomad version
Nomad v1.2.2 (78b8c171a211f967a8b297a88a7e844b3543f2b0)
Operating system and Environment details
Windows Server 2016 on an Azure VM
Issue
I get the following message intermittently when using raw_exec plugin on windows:
logmon: Unrecognized remote plugin message: This usually means that the plugin is either invalid or simply needs to be recompiled to support the latest protocol.
It happens multiple times in a row, and only appears to be resolved when switching the job to another VM. Other identical allocations can start after an allocation fails like this, so it seems like it's something in the plugin itself that's not working.
Reproduction steps
Unable to reproduce reliably.
Expected Result
No plugin error .
Actual Result
Plugin error.
Job file (if appropriate)
I've tried this:
and wrapped in a script:
Both with the same result.
Nomad Server logs (if appropriate)
No errors or warnings in server logs.
Nomad Client logs (if appropriate)
No errors or warnings in client logs.
The text was updated successfully, but these errors were encountered: