Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH action to generate report #199

Draft
wants to merge 96 commits into
base: main
Choose a base branch
from
Draft

GH action to generate report #199

wants to merge 96 commits into from

Conversation

asmacdo
Copy link
Member

@asmacdo asmacdo commented Sep 25, 2024

Fixes #177

Step 1: Create Skeleton

  • Authenticate with AWS
  • Connect to K8s cluster
  • deploys our job-runner pod onto a Karpenter NodeClaim
  • Creates a SPOT node as needed
  • Run dummy job
  • Delete Pod
  • Scale Down

I've verified that when a user-node is available (created by running a tiny jupyterhub), the job pod schedules on that node. I then shut down my jupyterhub and all user-nodes scaled down. I reran this job, and Karpenter successfully scaled up a new spot node, the pod was scheduled on it, ran successfully, was deleted, and the node cleaned up. Step 1 complete!

Step 2 Generate Report

  • Connect Pod to EFS
  • List users
  • du each user
  • du shared
  • collate data into report
  • Double Check that nodes come up and down successfully
  • Run job several times in 1 day, check next day for EFS usage spike (IIUC we should be fine because EFS is Bursting mode)

Step 3 Push Report

  • Create private GitHub repository to store reports
  • Configure bot permission to push to repo
  • push report to repo on complete

Questions to answer:

  • If a SPOT node is preempted, can we redeploy again later?

Comment on lines 4 to 6
pull_request:
branches:
- main
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also run this on a weekly basis?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's on PR push just so I can test easily, but yes, 1/week sounds good to me. @kabilar Do you have a preference for what day/time?

Copy link
Member

@kabilar kabilar Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thank you. How about Mondays at 6am EST? We can then review the report on Monday mornings.

@kabilar
Copy link
Member

kabilar commented Nov 7, 2024

Hi @asmacdo, just checking in to see how this report generation is going? Thanks.

@asmacdo asmacdo force-pushed the cron-data-usage-report branch 3 times, most recently from a4b44f9 to 965a81e Compare November 11, 2024 16:55
@asmacdo
Copy link
Member Author

asmacdo commented Nov 11, 2024

Ran into some problems

The du script ran for about 50 minutes and then the pod disappeared without logs.

Worse it kicked my jupyterhub pod as well as another user.

[I 2024-11-11 17:57:57.999 JupyterHub log:192] 200 GET /hub/error/503?url=%2Fuser%2Fasmacdo%2Fterminals%2Fwebsocket%2F1 (@100.64.247.104) 7.52ms
[W 2024-11-11 17:57:59.266 JupyterHub base:1254] User asmacdo server stopped, with exit code: 1
[I 2024-11-11 17:57:59.266 JupyterHub proxy:357] Removing user asmacdo from proxy (/user/asmacdo/)

I think this means we need to take a different approach. By setting resource limits, we should have isolated our job from the other pods, but since I have no other logs about what happened here I think we need to take a more conservative approach that is completely isolated from user pods.

I did it this way because I thought it would be simpler, but if theres any chance that we affect a running user pod, we would be better off directly deploying a separate EC2 instance and bind the EFS directly, avoiding Kubernetes altogether.

@kabilar
Copy link
Member

kabilar commented Nov 14, 2024

we would be better off directly deploying a separate EC2 instance and bind the EFS directly, avoiding Kubernetes altogether.

Thanks @asmacdo. That makes sense.

@asmacdo asmacdo force-pushed the cron-data-usage-report branch from 81b43e0 to 747f0a4 Compare December 2, 2024 17:47
@@ -0,0 +1,63 @@
#!/usr/bin/env bash

set -e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
set -e
set -eu

@@ -0,0 +1,133 @@
#!/usr/bin/env bash

set -e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
set -e
set -eu

LOCAL_SCRIPTS_DIR=".github/scripts"
REMOTE_SCRIPTS_DIR="/home/ec2-user/scripts"
MOUNT_POINT="/mnt/efs"
ENV_FILE=".ec2-session.env"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike the fact that it would be just dumping into some hidden file in my current directory.
Could we make it dumped into some tmpdir and may be establish there env variable with its path so cleanup script could take it from that environment

Suggested change
ENV_FILE=".ec2-session.env"
ENV_FILE="/run/user/$(id -u)/ec2-session.env"

but then we might want to add logic to react if file already exist -- since that would mean likely that cleanup did not remove it and instance might still be running etc.

set -e

# Load environment variables from the file if they are not already set
ENV_FILE=".ec2-session.env"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ENV_FILE=".ec2-session.env"
ENV_FILE="/run/user/$(id -u)/ec2-session.env"

Copy link
Member

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left comments on the report and walking.

The plan:

{
   "total":
    {
      "file_size": 123123123123, 
      "file_count":123
    },
   "caches": {
        "__pycache__": {  # count directories which last path component is `__pycache__`
             "file_size": 12123123,
             "file_count": 12,
             "directories": [
                  "a/e/__pycache__"
                  "a/b/d/e/__pycache__"
                  "a/e/__pycache__/c/__pycache__"  # should not happen in the wild
             ],
        "pip_cache": {
               "file_size": 13,
               "file_count": 1,
               "directories": [
                    "blah/.cache/pip
               ]
   }

.github/scripts/create-file-index.py Outdated Show resolved Hide resolved
.github/scripts/create-file-index.py Outdated Show resolved Hide resolved
import gzip
from datetime import datetime

def list_files_with_metadata(directory, output_file):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write a simple test, could even probably be in this file, where you populate directory with nested folders and symlinks and you know the ground truth to aim for and compare against.


files_metadata = []

for root, dirs, files in os.walk(directory):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FTR. walk seems to be already doing desired (right) thing and does not follow symlinked folders. We get

In [14]: list(os.walk('/tmp/1234'))
Out[14]: 
[('/tmp/1234', ['infinitum', 'subdir', 'linkgood'], []),
 ('/tmp/1234/subdir', ['subdir2'], ['file']),
 ('/tmp/1234/subdir/subdir2', [], ['file2'])]

for

❯ tree /tmp/1234
/tmp/1234
├── infinitum -> /tmp/1234
├── linkgood -> subdir
└── subdir
    ├── file
    └── subdir2
        └── file2

note: we do not monitor empty folders below

.github/scripts/produce-report.py Outdated Show resolved Hide resolved
@asmacdo
Copy link
Member Author

asmacdo commented Dec 12, 2024

see if more recent AMI with newer than 3.7 python available (e.g. 3.10 to be future proof somewhat)

Amazon Linux 2023 comes with 3.9, which is sufficient. I briefly tried ubuntu 22.04, (with 3.12) but that would add complexity, requiring amazon-efs-utils to be installed from source.

@asmacdo
Copy link
Member Author

asmacdo commented Dec 16, 2024

Update:

  • The automation is incomplete
    • the launch script sets up an ec2 instance with the scripts
    • scripts must be executed by hand
    • produced data must be retrieved by hand
  • Running in parallel (8 simultaneous) the file indexing takes about 16 minutes
  • The file indexes are 499MB
Heres an example of the json output produced by the report generator:
{
  "total_size": 238294213,
  "file_count": 3639,
  "caches": {
    "pycache": {
      "total_size": 22782820,
      "file_count": 1431,
      "directories": [
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pkg_resources/extern/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pkg_resources/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pkg_resources/_vendor/jaraco/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pkg_resources/_vendor/jaraco/text/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pkg_resources/_vendor/importlib_resources/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pkg_resources/_vendor/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pkg_resources/_vendor/packaging/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pkg_resources/_vendor/pyparsing/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pkg_resources/_vendor/pyparsing/diagram/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pkg_resources/_vendor/more_itertools/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/setuptools/_distutils/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/setuptools/_distutils/command/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/setuptools/extern/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/setuptools/config/_validate_pyproject/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/setuptools/config/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/setuptools/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/setuptools/_vendor/jaraco/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/setuptools/_vendor/jaraco/text/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/setuptools/_vendor/importlib_resources/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/setuptools/_vendor/importlib_metadata/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/setuptools/_vendor/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/setuptools/_vendor/tomli/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/setuptools/_vendor/packaging/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/setuptools/_vendor/pyparsing/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/setuptools/_vendor/pyparsing/diagram/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/setuptools/_vendor/more_itertools/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/setuptools/command/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/_distutils_hack/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_internal/index/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_internal/cli/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_internal/req/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_internal/operations/build/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_internal/operations/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_internal/operations/install/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_internal/resolution/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_internal/resolution/legacy/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_internal/resolution/resolvelib/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_internal/models/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_internal/vcs/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_internal/locations/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_internal/commands/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_internal/network/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_internal/utils/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_internal/metadata/importlib/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_internal/metadata/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_internal/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_internal/distributions/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_vendor/webencodings/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_vendor/pkg_resources/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_vendor/idna/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_vendor/requests/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_vendor/msgpack/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_vendor/rich/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_vendor/certifi/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_vendor/colorama/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_vendor/colorama/tests/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_vendor/tenacity/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_vendor/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_vendor/distro/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_vendor/cachecontrol/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_vendor/cachecontrol/caches/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_vendor/tomli/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_vendor/packaging/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_vendor/pyparsing/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_vendor/pyparsing/diagram/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_vendor/urllib3/util/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_vendor/urllib3/contrib/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_vendor/urllib3/contrib/_securetransport/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_vendor/urllib3/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_vendor/urllib3/packages/backports/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_vendor/urllib3/packages/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_vendor/distlib/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_vendor/chardet/cli/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_vendor/chardet/metadata/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_vendor/chardet/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_vendor/resolvelib/compat/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_vendor/resolvelib/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_vendor/platformdirs/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_vendor/pygments/styles/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_vendor/pygments/filters/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_vendor/pygments/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_vendor/pygments/lexers/__pycache__",
        "asmacdo/venvs/my_venv/lib/python3.11/site-packages/pip/_vendor/pygments/formatters/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pkg_resources/extern/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pkg_resources/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pkg_resources/_vendor/jaraco/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pkg_resources/_vendor/jaraco/text/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pkg_resources/_vendor/importlib_resources/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pkg_resources/_vendor/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pkg_resources/_vendor/packaging/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pkg_resources/_vendor/pyparsing/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pkg_resources/_vendor/pyparsing/diagram/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pkg_resources/_vendor/more_itertools/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/setuptools/_distutils/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/setuptools/_distutils/command/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/setuptools/extern/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/setuptools/config/_validate_pyproject/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/setuptools/config/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/setuptools/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/setuptools/_vendor/jaraco/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/setuptools/_vendor/jaraco/text/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/setuptools/_vendor/importlib_resources/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/setuptools/_vendor/importlib_metadata/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/setuptools/_vendor/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/setuptools/_vendor/tomli/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/setuptools/_vendor/packaging/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/setuptools/_vendor/pyparsing/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/setuptools/_vendor/pyparsing/diagram/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/setuptools/_vendor/more_itertools/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/setuptools/command/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/_distutils_hack/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_internal/index/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_internal/cli/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_internal/req/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_internal/operations/build/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_internal/operations/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_internal/operations/install/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_internal/resolution/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_internal/resolution/legacy/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_internal/resolution/resolvelib/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_internal/models/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_internal/vcs/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_internal/locations/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_internal/commands/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_internal/network/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_internal/utils/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_internal/metadata/importlib/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_internal/metadata/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_internal/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_internal/distributions/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_vendor/webencodings/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_vendor/pkg_resources/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_vendor/idna/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_vendor/requests/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_vendor/msgpack/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_vendor/rich/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_vendor/certifi/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_vendor/colorama/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_vendor/colorama/tests/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_vendor/tenacity/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_vendor/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_vendor/distro/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_vendor/cachecontrol/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_vendor/cachecontrol/caches/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_vendor/tomli/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_vendor/packaging/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_vendor/pyparsing/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_vendor/pyparsing/diagram/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_vendor/urllib3/util/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_vendor/urllib3/contrib/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_vendor/urllib3/contrib/_securetransport/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_vendor/urllib3/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_vendor/urllib3/packages/backports/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_vendor/urllib3/packages/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_vendor/distlib/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_vendor/chardet/cli/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_vendor/chardet/metadata/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_vendor/chardet/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_vendor/resolvelib/compat/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_vendor/resolvelib/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_vendor/platformdirs/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_vendor/pygments/styles/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_vendor/pygments/filters/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_vendor/pygments/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_vendor/pygments/lexers/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/pip/_vendor/pygments/formatters/__pycache__",
        "asmacdo/venvs/myenv/lib/python3.11/site-packages/con_duct/__pycache__",
        "asmacdo/dandi-notebooks/000971/lernerlab/seiler_2024/__pycache__"
      ]
    },
    "user_cache": {
      "total_size": 10852503,
      "file_count": 81,
      "directories": [
        "asmacdo/.cache"
      ]
    },
    "yarn_cache": {
      "total_size": 0,
      "file_count": 0,
      "directories": []
    },
    "pip_cache": {
      "total_size": 10852503,
      "file_count": 80,
      "directories": [
        "asmacdo/.cache/pip"
      ]
    },
    "nwb_cache": {
      "total_size": 0,
      "file_count": 0,
      "directories": []
    }
  }
}

Notably, the total size 238294213 is less than reported by du.
du --apparent: 247232941
du: 252841984

Python 3.9 is new enough. Ubuntu was not ideal because the EFS mounting
require amazonl-efs-utils which is available via package manage for amazon linux,
but requires building from source on Ubuntu.
gzip will only be needed if we want to upload to s3
=== Do not change lines below ===
{
 "chain": [],
 "cmd": "black .github/scripts/calculate-directory-stats.py",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [
  ".github/scripts/calculate-directory-stats.py"
 ],
 "pwd": "."
}
^^^ Do not change lines above ^^^
Holding the entire file index in memory would be risky, especially with
zarr files
Whether there is an error or not, write all fields for each file.
This allows us to read the tsv file and unpack the same number of values
for each line.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

User data quota cron job
3 participants