You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After upgrading to iRODS 4.3, I'm seeing a problem where one or both of the two iRODS iCAT servers I run hang and time out when starting up the containers in docker compose. I did not experience this in the v4.2 versions of this image.
When this happens, the iRODS server is in an unresponsive state. Local iinit as well as external connection attempts from the docker host time out. Other servers in the network (postgres and redis) are working ok and reachable. Docker compose output displays the following:
irods-1 | Start iRODS
irods-1 | Test iinit
irods-1 | remote addresses: 172.20.0.5 ERROR: _rcConnect: connectToRhost timed out, retrying
irods-1 | WARNING: Error 25 disabling echo mode. Password will be displayed in plaintext.
irods-1 | Enter your current iRODS password:
irods-1 | remote addresses: 172.20.0.5 ERROR: _rcConnect: connectToRhost error, server on irods:1247 is probably down status = -115000 SYS_SOCK_READ_TIMEDOUT
irods-1 | remote addresses: 172.20.0.5 ERROR: Saved password, but failed to connect to server irods
irods-1 | iinit failed
During this, running /etc/init.d/irods status in the container does display running iRODS processes, but curiously no irodsDelayServer:
root@irods:/# /etc/init.d/irods status
irodsServer :
Process 477
Process 478
Process 492
SSH:ing into the container and doing /etc/init.d/irods restart makes the server work normally. Most of the time, restarting the container does the same.
I have not yet found a way to consistently reproduce this, as it occurs seemingly randomly. This seem to happen most often after rebuilding the image and starting it for the first time, but not exclusively.
iRODS log says the following:
{"log_category":"server","log_level":"info","log_message":"Initializing server ...","server_host":"irods","server_pid":477,"server_timestamp":"2024-09-19T10:41:46.139Z","server_type":"server","server_zone":"sodarZone"}
{"log_category":"server","log_level":"info","log_message":"Setting up UNIX domain socket for agent factory ...","server_host":"irods","server_pid":477,"server_timestamp":"2024-09-19T10:41:46.141Z","server_type":"server","server_zone":"sodarZone"}
{"log_category":"server","log_level":"info","log_message":"Forking agent factory ...","server_host":"irods","server_pid":477,"server_timestamp":"2024-09-19T10:41:46.141Z","server_type":"server","server_zone":"sodarZone"}
{"log_category":"server","log_level":"info","log_message":"Connecting to agent factory [agent_factory_pid=478] ...","server_host":"irods","server_pid":477,"server_timestamp":"2024-09-19T10:41:46.142Z","server_type":"server","server_zone":"sodarZone"}
{"log_category":"agent_factory","log_level":"info","log_message":"Initializing agent factory ...","server_host":"irods","server_pid":478,"server_timestamp":"2024-09-19T10:41:46.143Z","server_type":"agent_factory","server_zone":"sodarZone"}
{"local_hostname":"localhost, irods","log_category":"server","log_level":"info","port":"1247","server_host":"irods","server_pid":477,"server_timestamp":"2024-09-19T10:41:46.190Z","server_type":"server","server_zone":"sodarZone"}
{"log_category":"server","log_level":"info","server_host":"irods","server_pid":477,"server_timestamp":"2024-09-19T10:41:46.206Z","server_type":"server","server_zone":"sodarZone","zone_info.host":"localhost","zone_info.name":"sodarZone","zone_info.port":"1247","zone_info.type":"LOCAL_ICAT"}
{"log_category":"server","log_level":"info","log_message":"rodsServer Release version rods4.3.1 - API Version d is up","server_host":"irods","server_pid":477,"server_timestamp":"2024-09-19T10:41:46.207Z","server_type":"server","server_zone":"sodarZone"}
{"log_category":"legacy","log_level":"info","log_message":">>> control plane :: listening on port 1248\n","server_host":"irods","server_pid":477,"server_timestamp":"2024-09-19T10:41:46.208Z","server_type":"server","server_zone":"sodarZone"}
{"log_category":"server","log_level":"error","log_message":"readWorkerTask - readStartupPack failed. -4000","server_host":"irods","server_pid":477,"server_timestamp":"2024-09-19T10:41:48.177Z","server_type":"server","server_zone":"sodarZone"}
The last readStartupPack failed bit looks of interest. However, that line also seems to appear on successful startups?
How to Reproduce
Start, stop and restart the docker compose environment 1-N times.
Oddly, this seems to happen on different intervals depending on the network I'm connected to when running the docker compose environment. However, I can't say for sure this isn't just a coincidence.
Also, I have not witnessed this happening when starting the image in GitHub Actions as part of SODAR server CI.
Further Steps for Debugging
Upgrade from 4.3.1 to 4.3.3
Also happens on 4.3.3
Enable debug logging in iRODS
Search for reports of similar problems and solutions, especially regarding irodsDelayServer
Could not find anything that seems relevant
If all else fails, contact iRODS support
The text was updated successfully, but these errors were encountered:
mikkonie
changed the title
iRODS 4.3.1 server occasionally hangs on container startup
iRODS 4.3 server occasionally hangs on container startup
Sep 19, 2024
mikkonie
changed the title
iRODS 4.3 server occasionally hangs on container startup
iRODS 4.3 iCAT server occasionally hangs on container startup
Sep 19, 2024
This hasn't happened lately in dev, except occasionally after rebuilding the image. I'll keep looking into it and see if it also occurs in staging/production, but if not, I won't be considering this a blocker for v4.3.3-1.
Sadly, I have observed this also happening in staging after rebuilding the server. I should try to debug the issue with iRODS support. First, I should most likely enable trace logging and trigger the freeze-up with a rebuild.
Minimal example for reproducing this here. This seems to (mostly? exclusively?) affect rebuilding an existing image. Maybe some critical file is not correctly stored in volumes?
After upgrading to iRODS 4.3, I'm seeing a problem where one or both of the two iRODS iCAT servers I run hang and time out when starting up the containers in docker compose. I did not experience this in the v4.2 versions of this image.
The 4.3 upgrade has been merged in the dev branch.
The docker-compose.yml used to launch the containers is here.
When this happens, the iRODS server is in an unresponsive state. Local
iinit
as well as external connection attempts from the docker host time out. Other servers in the network (postgres and redis) are working ok and reachable. Docker compose output displays the following:During this, running
/etc/init.d/irods status
in the container does display running iRODS processes, but curiously noirodsDelayServer
:SSH:ing into the container and doing
/etc/init.d/irods restart
makes the server work normally. Most of the time, restarting the container does the same.I have not yet found a way to consistently reproduce this, as it occurs seemingly randomly. This seem to happen most often after rebuilding the image and starting it for the first time, but not exclusively.
iRODS log says the following:
The last
readStartupPack failed
bit looks of interest. However, that line also seems to appear on successful startups?How to Reproduce
Start, stop and restart the docker compose environment 1-N times.
Oddly, this seems to happen on different intervals depending on the network I'm connected to when running the docker compose environment. However, I can't say for sure this isn't just a coincidence.
Also, I have not witnessed this happening when starting the image in GitHub Actions as part of SODAR server CI.
Further Steps for Debugging
Upgrade from 4.3.1 to 4.3.3irodsDelayServer
The text was updated successfully, but these errors were encountered: