-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Connectivity problems with the RETORCH agent #144
Comments
@augustocristian I have checked the host settings. no changes were made during this year (the only exception is the disk size extension of the retorch VM several months ago). The current configuration of this VM seems to be the same as other VMs:
Currently, the host is not under memory preasure: free unused host memory used is 78GB. The VM is taking 4098MB. I'm doing some tests now... |
@augustocristian I rerun one of the failed PRs: #139 and observed from the host:
From this scenario;
|
@javiertuya thanks for all. I've checked this morning and reinstalled the kernel recommended by Microsoft |
@augustocristian If the VM did not have this kernel installed, this is the most probable cause:
|
I rebooted the VM several times and conducted multiple tests using the original kernel, the Azure kernel, and other kernels recommended by the community to address the memory allocation issues. None of these solutions worked. When I start modifying the RETORCH tool, I plan to include this debugging information to prevent similar issues in the future (currently, it only logs the Docker and Compose versions). |
@augustocristian So, i guess that in the afternoon, the VM was Not using the Azure kernel, right? This is the official kernel to support the host integration services, and one of the most important services is the dynamic memory management. Tell me if the rerun of all failing updates succeeds to perform the merges |
All the branches (without regarding the Jupyter ones) passing @javiertuya, waiting for include the changes of #145 to update #143 and check what changes should be included to solve the problem |
For the past two weeks, we've been experiencing problems with all the pipelines in the CI system. I've researching about the problem and the first insights show connectivity issues with the agent.
I think that I've found the root cause: it seems that the virtual machine is not detecting all the installed memory, not allowing to execute the containers in parallel (I've noticed a significant slowdown when attempting to perform tasks manually during execution).
I observed this using
htop
command, the VM starts increasing the use of the swap space instead of the available memory:I've attempted to restore the only thing that changed from the previous executions: the Docker version. I created a script to remove the current version and manually install an older one (26.0.1), but the problem still remains.
The
/proc/meminfo
is aligned with htop:There could be some issues or misconfigurations with the hypervisor? @javiertuya
The text was updated successfully, but these errors were encountered: