Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Stale file handle" can cause crash loop #6

Closed
consideRatio opened this issue Nov 13, 2023 · 1 comment
Closed

"Stale file handle" can cause crash loop #6

consideRatio opened this issue Nov 13, 2023 · 1 comment

Comments

@consideRatio
Copy link

consideRatio commented Nov 13, 2023

Issue observed by @GeorgianaElena in 2i2c-org/infrastructure#3224 (comment)

The dirsize reported was found to crash based on OSError: [Errno 116] Stale file handle: '/shared-volume'. In scope for this project, I think there is an action point to make it more robust for this kind of failure.

image

kubectl describe of a pod crashing:

Events:
  Type     Reason  Age                     From     Message
  ----     ------  ----                    ----     -------
  Warning  Failed  40m (x4685 over 7d1h)   kubelet  (combined from similar events): Error: context deadline exceeded
  Normal   Pulled  11m (x16672 over 55d)   kubelet  Container image "quay.io/yuvipanda/prometheus-dirsize-exporter:v2.0" already present on machine
  Warning  Failed  6m6s (x2547 over 7d1h)  kubelet  Error: context deadline exceeded
Updated values for maxrjones
Updated values for damianavila
Traceback (most recent call last):
  File "/usr/local/bin/dirsize-exporter", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.11/site-packages/prometheus_dirsize_exporter/exporter.py", line 157, in main
    for subdir_info in walker.get_subdirs_info(args.parent_dir):
  File "/usr/local/lib/python3.11/site-packages/prometheus_dirsize_exporter/exporter.py", line 116, in get_subdirs_info
    for c in self.do_iops_action(os.listdir, dir_path)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/prometheus_dirsize_exporter/exporter.py", line 50, in do_iops_action
    return_value = func(*args, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^
OSError: [Errno 116] Stale file handle: '/shared-volume'

I'll open a report in the dirsize reporter project about this kind of error. I'm not sure what the resolution is, but it shouldn't be a blocker for the k8s upgrade aspect.

@yuvipanda
Copy link
Owner

See #7. Needs some more work though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants