-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pulling images fails when image is bigger #242
Comments
Which version are you running? Specifically is this the version that still uses celery? There is a timeout parameter that needs to be boosted. |
I’m using 18.03. |
I had to remind myself how I had fixed this before. The issue is with the gunicorn timeout. You just need to modify the service script to add the -t 3600 option. That gives it an hour which should be enough even for very large images. |
Thank you for response. I have another question. Why pulling & converting an image is consuming much more time unlike docker? Is there way to improve that? Happy new year:) |
I meant to reply this sooner. Shifter has to do the expansion and squash on each fresh pull. It does cache the layers. But it has to re-unzip each layer to build the squash image. I have noticed that the unzip for some layers can be very slow but have never been able to get to the bottom of it. I think it has something to do with the zip python library we use and how we are using. I would recommend using a fast file system for the temporary space where the API/worker runs. If it is a large memory node (> ~32 GB), you can even use /dev/shm for the expand directory. This can help to some degree. Let me know if you need the exact parameter to adjust. |
Thank you Canon. Your answer is very helpful for me. would you give me the exact parameter to adjust? |
Look this line the example config file. You just need to change that to /dev/shm or some other location that is one fast local storage or RAM. https://github.com/NERSC/shifter/blob/master/imagegw/imagemanager.json.example#L12 |
Hi. I wanna download & convert some of docker images to shifter images.
I've downloaded image successfully when image size is smaller than amount 4GB.
but the problem is that Image size is bigger than amount 4GB.
there are infinity "PULLING" messages and cannot finish pulling un image like this.
Message: {
"ENTRY": "MISSING",
"ENV": "MISSING",
"WORKDIR": "MISSING",
"groupACL": [],
"id": "MISSING",
"itype": "docker",
"last_pull": 1549005472.860614,
"status": "PULLING",
"status_message": "Extracting Layers",
"system": "mycluster",
"tag": [],
"userACL": []
}
2019-02-01T07:58:59 Pulling Image: docker:image_name:1.0.0-12, status: PULLING
and here are access and error log while pulling an image.
_==> error.log <==
[2019-02-01 07:59:37 +0000] [51665] [DEBUG] Closing connection.
[2019-02-01 07:59:37 +0000] [51691] [DEBUG] POST /api/pull/mycluster/docker/si-swhong%3A1.0.0-12/
[2019-02-01 07:59:37 +0000] [51691] [DEBUG] pull system=mycluster imgtype=docker tag=si-swhong:1.0.0-12
[2019-02-01 07:59:37 +0000] [51691] [DEBUG] {'tag': u'si-swhong:1.0.0-12', 'itype': u'docker', 'system': u'mycluster'}
[2019-02-01 07:59:37 +0000] [51691] [DEBUG] {'magic': 'imagemngrmagic', 'uid': 0, 'system': u'mycluster', 'tokens': {u'soe-db1:5000': u'u:p', u'default': u'u:p'}, 'gid': 0, 'user': 'root', 'group': 'root'}
[2019-02-01 07:59:37 +0000] [51691] [DEBUG] Pull called Test Mode=0
[2019-02-01 07:59:37 +0000] [51691] [DEBUG] {u'status': u'PULLING', u'ostcount': u'0', u'itype': u'docker', u'format': u'squashfs', u'last_heartbeat': 1549005477.959195, u'os': u'linux', u'groupACL': [], u'system': u'mycluster', u'private': None, u'status_message': u'Extracting Layers', u'pulltag': u'si-swhong:1.0.0-12', u'replication': u'1', u'tag': [], u'userACL': [], u'location': u'', u'last_pull': 1549005472.860614, u'remotetype': u'dockerv2', u'_id': ObjectId('5c53f2a0227509ba7a533871'), u'arch': u'amd64'}
...
[2019-02-01 08:19:08 +0000] [51691] [DEBUG] Closing connection.
[2019-02-01 08:19:09 +0000] [51659] [CRITICAL] WORKER TIMEOUT (pid:51666)
[2019-02-01 08:19:09 +0000] [51666] [WARNING] 1
[2019-02-01 08:19:09 +0000] [51666] [ERROR] ERROR: dopull failed system=mycluster tag=si-swhong:1.0.0-12
[2019-02-01 08:19:09 +0000] [51666] [INFO] Worker exiting (pid: 51666)
[2019-02-01 08:19:09 +0000] [51685] [WARNING] Operation failed for 5c5400b6227509c9d2e26974
[2019-02-01 08:19:09 +0000] [51685] [INFO] Shutting down Status Thread
[2019-02-01 08:19:09 +0000] [51691] [DEBUG] POST /api/pull/mycluster/docker/si-swhong%3A1.0.0-12/
[2019-02-01 08:19:09 +0000] [51691] [DEBUG] pull system=mycluster imgtype=docker tag=si-swhong:1.0.0-12
[2019-02-01 08:19:09 +0000] [51691] [DEBUG] {'tag': u'si-swhong:1.0.0-12', 'itype': u'docker', 'system': u'mycluster'}
[2019-02-01 08:19:09 +0000] [51691] [DEBUG] {'magic': 'imagemngrmagic', 'uid': 0, 'system': u'mycluster', 'tokens': {u'soe-db1:5000': u'u:p', u'default': u'u:p'}, 'gid': 0, 'user': 'root', 'group': 'root'}
[2019-02-01 08:19:09 +0000] [51691] [DEBUG] Pull called Test Mode=0
[2019-02-01 08:19:09 +0000] [51691] [DEBUG] {u'status': u'FAILURE', u'ostcount': u'0', u'itype': u'docker', u'format': u'squashfs', u'last_heartbeat': 1549009149.245521, u'os': u'linux', u'groupACL': [], u'system': u'mycluster', u'private': None, u'status_message': u'FAILURE', u'pulltag': u'si-swhong:1.0.0-12', u'replication': u'1', u'tag': [], u'userACL': [], u'location': u'', u'last_pull': 1549009078.383216, u'remotetype': u'dockerv2', u'_id': ObjectId('5c5400b6227509c9d2e26974'), u'arch': u'amd64'}
==> access.log <==
127.0.0.1 - - [01/Feb/2019:07:59:37 +0000] "POST /api/pull/mycluster/docker/si-swhong%3A1.0.0-12/ HTTP/1.1" 200 290 "-" "-"_
I have tried to upgrade gunicorn(to 19.9) and also install gevent(1.3.6). but it does not help them.
ExecStart=/usr/bin/gunicorn
-b 0.0.0.0:6000 --backlog 2048
--log-level=debug
--access-logfile=/var/log/shifter_imagegw/access.log
--log-file=/var/log/shifter_imagegw/error.log
--timeout 60
--workers 4
--threads 4
--worker-class=gevent
shifter_imagegw.api:app
how could resolve this problem?
The text was updated successfully, but these errors were encountered: