Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH action to generate report #199

Draft
wants to merge 96 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
96 commits
Select commit Hold shift + click to select a range
30e9cbf
Inital commit to add GH action to generate report
asmacdo Sep 25, 2024
3bcba91
Assume Jupyterhub Provisioning Role
asmacdo Sep 25, 2024
b5cdcf3
Fixup: indent
asmacdo Sep 25, 2024
6e118cc
Rename job
asmacdo Sep 25, 2024
5062f08
Add assumed role to update-kubeconfig
asmacdo Sep 25, 2024
d21a3a9
No need to add ProvisioningRole to masters
asmacdo Sep 25, 2024
403028f
Deploy a pod to the cluster, and schedule with Karpenter
asmacdo Sep 25, 2024
92b9925
Fixup: correct path to pod manifest
asmacdo Sep 25, 2024
478a31f
Fixup again ugh, rename file
asmacdo Sep 25, 2024
9db914e
Delete Pod even if previous step times out
asmacdo Sep 25, 2024
8458d01
Hack out initial du
asmacdo Oct 11, 2024
7999455
tmp comment out job deployment, test dockerhub build
asmacdo Nov 8, 2024
d2e65de
Fixup hyphens for image name
asmacdo Nov 8, 2024
5e9e7df
Write file to output location
asmacdo Nov 8, 2024
d33973c
use kubectl cp to retrieve report
asmacdo Nov 8, 2024
98fecbc
Combine run blocks to use vars
asmacdo Nov 8, 2024
40ae0e8
Mount efs and pass arg to du script
asmacdo Nov 8, 2024
4c978f7
Comment out repo pushing, lets see if the report runs
asmacdo Nov 8, 2024
6bd7b82
Restrict job to asmacdo for testing
asmacdo Nov 8, 2024
73c3e80
Sanity check. Just list the directories
asmacdo Nov 8, 2024
685dfb1
Job was deployed, but never assigned to node, back to sanity check
asmacdo Nov 8, 2024
f6afefc
change from job to pod
asmacdo Nov 8, 2024
6dad759
deploy pod to same namespace as pvc
asmacdo Nov 8, 2024
3a33937
Use ns in action
asmacdo Nov 8, 2024
1ffb1c9
increase timeout to 60s
asmacdo Nov 8, 2024
58e0753
fixup: image name in manifest
asmacdo Nov 8, 2024
6767755
increase timeout to 150
asmacdo Nov 8, 2024
cbf951e
override entrypoint so i can debug with exec
asmacdo Nov 8, 2024
59eb045
bound /home actually meant path was /home/home/asmacdo
asmacdo Nov 8, 2024
db140d5
Create output dir prior to writing report
asmacdo Nov 8, 2024
f90176a
pod back to job
asmacdo Nov 11, 2024
c31ccdd
Fixup use the correct job api
asmacdo Nov 11, 2024
3ee9d9f
Add namespace to pod retrieval
asmacdo Nov 11, 2024
d7f81ba
write directly to pv to test job
asmacdo Nov 11, 2024
0856baa
fixup script fstring
asmacdo Nov 11, 2024
5301b1b
no retry on failure, we were spinning up 5 pods, lets just fail 1 time
asmacdo Nov 11, 2024
7384274
Fixup backup limit job not template
asmacdo Nov 11, 2024
8e81e38
Initial report
asmacdo Nov 11, 2024
cb5db49
disable report
asmacdo Nov 11, 2024
5d188a7
deploy ec2 instance directly
asmacdo Dec 2, 2024
2f39e9c
Update AMI image
asmacdo Dec 2, 2024
3a21106
update sg and subnet
asmacdo Dec 2, 2024
6a54da0
terminate even if job fails
asmacdo Dec 2, 2024
87075fb
debug: print public ip
asmacdo Dec 2, 2024
48c7f35
explicitly allocate public ip for ec2 instance
asmacdo Dec 2, 2024
743359e
Add WIP scripts
asmacdo Dec 6, 2024
0ba12f2
rm old unused
asmacdo Dec 6, 2024
2893ab2
initial commit of scripts
asmacdo Dec 6, 2024
5ef8f80
clean up launch script
asmacdo Dec 6, 2024
b02720e
make scripe executable
asmacdo Dec 6, 2024
ae98909
fixup cleanup script
asmacdo Dec 6, 2024
7e80e4a
add a name to elastic ip (for easier manual cleanup)
asmacdo Dec 6, 2024
f2a4116
Exit on fail
asmacdo Dec 6, 2024
6ffef17
Add permission for aws ec2 wait instance-status-ok
asmacdo Dec 6, 2024
20cc085
Upload scripts to instance
asmacdo Dec 6, 2024
76477df
explicitly return
asmacdo Dec 6, 2024
b38ded1
output session variables to file
asmacdo Dec 11, 2024
f795570
modify cleanup script to retrieve instance from temporary file
asmacdo Dec 11, 2024
f8a92b2
All ec2 persmissions granted
asmacdo Dec 11, 2024
e9726df
Add EFS mount (hardcoded)
asmacdo Dec 11, 2024
c6e92f9
No pager for termination
asmacdo Dec 11, 2024
17d77cd
force pseudo-terminal, otherwise hangs after yum install
asmacdo Dec 11, 2024
2246af5
Add doublequotes to variable usage for proper expansion
asmacdo Dec 11, 2024
b49b7b5
Fixup -t goes on ssh, not scp
asmacdo Dec 11, 2024
584ac4d
Mount as a single command, since we dont have access to pty
asmacdo Dec 11, 2024
4a700e5
add todos for manual steps
asmacdo Dec 11, 2024
6339924
Disable job for now
asmacdo Dec 11, 2024
17130ef
Update AMI to ubuntu
asmacdo Dec 12, 2024
cc29df5
Roll back to AL 2023
asmacdo Dec 12, 2024
295361c
drop gzip, just write json
asmacdo Dec 13, 2024
a667c04
include target dir in relative paths
asmacdo Dec 13, 2024
a91beb0
Second script will not produce user report, but directory stats json
asmacdo Dec 13, 2024
9371982
inital algorithm hackout
asmacdo Dec 13, 2024
8cead5a
Clean up and refactor for simplicity
asmacdo Dec 13, 2024
86a7c72
Add basic tests
asmacdo Dec 13, 2024
fc1cab1
test multiple directories in root
asmacdo Dec 13, 2024
2308aed
comment about [:-1]
asmacdo Dec 13, 2024
84754fe
support abspaths
asmacdo Dec 14, 2024
a1427ac
[DATALAD RUNCMD] blacken
asmacdo Dec 14, 2024
16e4890
test propagation with files in all dirs
asmacdo Dec 14, 2024
528833d
Write files to disk as they are inspected
asmacdo Dec 15, 2024
3c0e7f7
Comment out column headers in output
asmacdo Dec 15, 2024
260c69d
Write all fields for every file
asmacdo Dec 15, 2024
87dd8ca
Convert to reading tsv
asmacdo Dec 15, 2024
e0e0a32
Fixup: update test to match tsv-read data
asmacdo Dec 15, 2024
41aaa2a
update for renamed script
asmacdo Dec 15, 2024
25e27eb
install pip
asmacdo Dec 15, 2024
204b70e
install parallel
asmacdo Dec 15, 2024
64d653e
install dependencies in launch script
asmacdo Dec 15, 2024
6475f11
Output to tmp, accept only 1 arg, target dir
asmacdo Dec 15, 2024
b67c063
add up sizes
asmacdo Dec 16, 2024
3241473
print useful info as index is created
asmacdo Dec 16, 2024
f4eb101
dont fail if output dir exists
asmacdo Dec 16, 2024
13e0e75
Create a report dict with only relevant stats
asmacdo Dec 16, 2024
a7e6991
output data reports
asmacdo Dec 20, 2024
845df00
Remove unused
asmacdo Jan 17, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 1 addition & 63 deletions .aws/terraform-jupyterhub-provisioning-policies.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,69 +4,7 @@
{
"Effect": "Allow",
"Action": [
"ec2:AllocateAddress",
"ec2:AssociateAddress",
"ec2:AssociateRouteTable",
"ec2:AssociateVpcCidrBlock",
"ec2:AttachInternetGateway",
"ec2:AttachNetworkInterface",
"ec2:AuthorizeSecurityGroupEgress",
"ec2:AuthorizeSecurityGroupIngress",
"ec2:CreateInternetGateway",
"ec2:CreateLaunchTemplate",
"ec2:CreateLaunchTemplateVersion",
"ec2:CreateNatGateway",
"ec2:CreateNetworkAcl",
"ec2:CreateNetworkAclEntry",
"ec2:CreateNetworkInterface",
"ec2:CreateNetworkInterfacePermission",
"ec2:CreateRoute",
"ec2:CreateRouteTable",
"ec2:CreateSecurityGroup",
"ec2:CreateSubnet",
"ec2:CreateTags",
"ec2:CreateVpc",
"ec2:DeleteInternetGateway",
"ec2:DeleteLaunchTemplate",
"ec2:DeleteLaunchTemplateVersions",
"ec2:DeleteNatGateway",
"ec2:DeleteNetworkAcl",
"ec2:DeleteNetworkAclEntry",
"ec2:DeleteNetworkInterface",
"ec2:DeleteRoute",
"ec2:DeleteRouteTable",
"ec2:DeleteSecurityGroup",
"ec2:DeleteSubnet",
"ec2:DeleteTags",
"ec2:DeleteVpc",
"ec2:DescribeAddresses",
"ec2:DescribeAddressesAttribute",
"ec2:DescribeAvailabilityZones",
"ec2:DescribeInternetGateways",
"ec2:DescribeLaunchTemplateVersions",
"ec2:DescribeLaunchTemplates",
"ec2:DescribeNatGateways",
"ec2:DescribeNetworkAcls",
"ec2:DescribeNetworkInterfacePermissions",
"ec2:DescribeNetworkInterfaces",
"ec2:DescribeRouteTables",
"ec2:DescribeSecurityGroupRules",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSubnets",
"ec2:DescribeVpcAttribute",
"ec2:DescribeVpcs",
"ec2:DetachInternetGateway",
"ec2:DetachNetworkInterface",
"ec2:DisassociateAddress",
"ec2:DisassociateRouteTable",
"ec2:DisassociateVpcCidrBlock",
"ec2:ModifyNetworkInterfaceAttribute",
"ec2:ModifyVpcAttribute",
"ec2:ReleaseAddress",
"ec2:ReplaceRoute",
"ec2:RevokeSecurityGroupEgress",
"ec2:RevokeSecurityGroupIngress",
"ec2:RunInstances",
"ec2:*",
"ecr-public:GetAuthorizationToken",
"eks:*",
"elasticfilesystem:CreateFileSystem",
Expand Down
193 changes: 193 additions & 0 deletions .github/scripts/calculate-directory-stats.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@
#!/usr/bin/env python3

import os
import csv
import json
import sys
import unittest
from collections import defaultdict
from pathlib import Path
from pprint import pprint
from typing import Iterable


def propagate_dir(stats, current_parent, previous_parent):
assert os.path.isabs(current_parent) == os.path.isabs(
previous_parent
), "current_parent and previous_parent must both be abspath or both be relpath"
highest_common = os.path.commonpath([current_parent, previous_parent])
assert highest_common, "highest_common must either be a target directory or /"

path_to_propagate = os.path.relpath(previous_parent, highest_common)
# leaves off last to avoid propagating to the path we are propagating from
nested_dir_list = path_to_propagate.split(os.sep)[:-1]
# Add each dir count to all ancestors up to highest common dir
while nested_dir_list:
working_dir = os.path.join(highest_common, *nested_dir_list)
stats[working_dir]["file_count"] += stats[previous_parent]["file_count"]
stats[working_dir]["total_size"] += stats[previous_parent]["total_size"]
nested_dir_list.pop()
previous_parent = working_dir
stats[highest_common]["file_count"] += stats[previous_parent]["file_count"]
stats[highest_common]["total_size"] += stats[previous_parent]["total_size"]


def generate_directory_statistics(data: Iterable[str]):
# Assumes dirs are listed depth first (files are listed prior to directories)

stats = defaultdict(lambda: {"total_size": 0, "file_count": 0})
previous_parent = ""
for filepath, size, modified, created, error in data:
# TODO if error is not None:
this_parent = os.path.dirname(filepath)
stats[this_parent]["file_count"] += 1
stats[this_parent]["total_size"] += int(size)

if previous_parent == this_parent:
continue
# going deeper
elif not previous_parent or previous_parent == os.path.dirname(this_parent):
previous_parent = this_parent
continue
else: # previous dir done
propagate_dir(stats, this_parent, previous_parent)
previous_parent = this_parent

# Run a final time with the root directory as this parent
# During final run, leading dir cannot be empty string, propagate_dir requires
# both to be abspath or both to be relpath
leading_dir = previous_parent.split(os.sep)[0] or "/"
propagate_dir(stats, leading_dir, previous_parent)
return stats


def iter_file_metadata(file_path):
"""
Reads a tsv and returns an iterable that yields one row of file metadata at
a time, excluding comments.
"""
file_path = Path(file_path)
with file_path.open(mode="r", newline="", encoding="utf-8") as file:
reader = csv.reader(file, delimiter="\t")
for row in reader:
# Skip empty lines or lines starting with '#'
if not row or row[0].startswith("#"):
continue
yield row

def update_stats(stats, directory, stat):
stats["total_size"] += stat["total_size"]
stats["file_count"] += stat["file_count"]

# Caches track directories, but not report as a whole
if stats.get("directories") is not None:
stats["directories"].append(directory)

def main():
if len(sys.argv) != 2:
print("Usage: python script.py <input_json_file>")
sys.exit(1)

input_tsv_file = sys.argv[1]
username = input_tsv_file.split("-index.tsv")[0]

data = iter_file_metadata(input_tsv_file)
stats = generate_directory_statistics(data)
cache_types = ["pycache", "user_cache", "yarn_cache", "pip_cache", "nwb_cache"]
report_stats = {
"total_size": 0,
"file_count": 0,
"caches": {
cache_type: {"total_size": 0, "file_count": 0, "directories": []}
for cache_type in cache_types
}
}

# print(f"{directory}: File count: {stat['file_count']}, Total Size: {stat['total_size']}")
for directory, stat in stats.items():
if directory.endswith("__pycache__"):
update_stats(report_stats["caches"]["pycache"], directory, stat)
elif directory.endswith(f"{username}/.cache"):
update_stats(report_stats["caches"]["user_cache"], directory, stat)
elif directory.endswith(".cache/yarn"):
update_stats(report_stats["caches"]["yarn_cache"], directory, stat)
elif directory.endswith(".cache/pip"):
update_stats(report_stats["caches"]["pip_cache"], directory, stat)
elif directory == username:
update_stats(report_stats, username, stat)

OUTPUT_DIR = "/home/austin/hub-user-reports/"
os.makedirs(OUTPUT_DIR, exist_ok=True)
with open(f"{OUTPUT_DIR}{username}-report.json", "w") as out:
json.dump(report_stats, out)


sorted_dirs = sorted(stats.items(), key=lambda x: x[1]['total_size'], reverse=True)
print(f"Finished {username} with Total {report_stats["total_size"]}")


class TestDirectoryStatistics(unittest.TestCase):
def test_propagate_dir(self):
stats = defaultdict(lambda: {"total_size": 0, "file_count": 0})
stats["a/b/c"] = {"total_size": 100, "file_count": 3}
stats["a/b"] = {"total_size": 10, "file_count": 0}
stats["a"] = {"total_size": 1, "file_count": 0}

propagate_dir(stats, "a", "a/b/c")
self.assertEqual(stats["a"]["file_count"], 3)
self.assertEqual(stats["a/b"]["file_count"], 3)
self.assertEqual(stats["a"]["total_size"], 111)

def test_propagate_dir_abs_path(self):
stats = defaultdict(lambda: {"total_size": 0, "file_count": 0})
stats["/a/b/c"] = {"total_size": 0, "file_count": 3}
stats["/a/b"] = {"total_size": 0, "file_count": 0}
stats["/a"] = {"total_size": 0, "file_count": 0}

propagate_dir(stats, "/a", "/a/b/c")
self.assertEqual(stats["/a"]["file_count"], 3)
self.assertEqual(stats["/a/b"]["file_count"], 3)

def test_propagate_dir_files_in_all(self):
stats = defaultdict(lambda: {"total_size": 0, "file_count": 0})
stats["a/b/c"] = {"total_size": 0, "file_count": 3}
stats["a/b"] = {"total_size": 0, "file_count": 2}
stats["a"] = {"total_size": 0, "file_count": 1}

propagate_dir(stats, "a", "a/b/c")
self.assertEqual(stats["a"]["file_count"], 6)
self.assertEqual(stats["a/b"]["file_count"], 5)

def test_generate_directory_statistics(self):
sample_data = [
("a/b/file3.txt", 3456, "2024-12-01", "2024-12-02", "OK"),
("a/b/c/file1.txt", 1234, "2024-12-01", "2024-12-02", "OK"),
("a/b/c/file2.txt", 2345, "2024-12-01", "2024-12-02", "OK"),
("a/b/c/d/file4.txt", 4567, "2024-12-01", "2024-12-02", "OK"),
("a/e/file3.txt", 5678, "2024-12-01", "2024-12-02", "OK"),
("a/e/f/file1.txt", 6789, "2024-12-01", "2024-12-02", "OK"),
("a/e/f/file2.txt", 7890, "2024-12-01", "2024-12-02", "OK"),
("a/e/f/g/file4.txt", 8901, "2024-12-01", "2024-12-02", "OK"),
]
stats = generate_directory_statistics(sample_data)
self.assertEqual(stats["a/b/c/d"]["file_count"], 1)
self.assertEqual(stats["a/b/c"]["file_count"], 3)
self.assertEqual(stats["a/b"]["file_count"], 4)
self.assertEqual(stats["a/e/f/g"]["file_count"], 1)
self.assertEqual(stats["a/e/f"]["file_count"], 3)
self.assertEqual(stats["a/e"]["file_count"], 4)
self.assertEqual(stats["a"]["file_count"], 8)


if __name__ == "__main__":
if len(sys.argv) > 1 and sys.argv[1] == "test":
unittest.main(
argv=sys.argv[:1]
) # Run tests if "test" is provided as an argument
else:
try:
main()
except Exception as e:
# print(f"FAILED ------------------------------ {sys.argv[1]}")
# raise(e)
pass
63 changes: 63 additions & 0 deletions .github/scripts/cleanup-ec2.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
#!/usr/bin/env bash

set -e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
set -e
set -eu


# Load environment variables from the file if they are not already set
ENV_FILE=".ec2-session.env"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ENV_FILE=".ec2-session.env"
ENV_FILE="/run/user/$(id -u)/ec2-session.env"

if [ -f "$ENV_FILE" ]; then
echo "Loading environment variables from $ENV_FILE..."
source "$ENV_FILE"
else
echo "Warning: Environment file $ENV_FILE not found."
fi

# Ensure required environment variables are set
if [ -z "$INSTANCE_ID" ]; then
echo "Error: INSTANCE_ID is not set. Cannot proceed with cleanup."
exit 1
fi

if [ -z "$ALLOC_ID" ]; then
echo "Error: ALLOC_ID is not set. Cannot proceed with cleanup."
exit 1
fi

# Check for AWS CLI and credentials
if ! command -v aws &>/dev/null; then
echo "Error: AWS CLI is not installed. Please install it and configure your credentials."
exit 1
fi

if ! aws sts get-caller-identity &>/dev/null; then
echo "Error: Unable to access AWS. Ensure your credentials are configured correctly."
exit 1
fi

# Terminate EC2 instance
echo "Terminating EC2 instance with ID: $INSTANCE_ID..."
if aws ec2 terminate-instances --instance-ids "$INSTANCE_ID" --no-cli-pager; then
echo "Instance termination initiated. Waiting for the instance to terminate..."
if aws ec2 wait instance-terminated --instance-ids "$INSTANCE_ID"; then
echo "Instance $INSTANCE_ID has been successfully terminated."
else
echo "Warning: Instance $INSTANCE_ID may not have terminated correctly."
fi
else
echo "Warning: Failed to terminate instance $INSTANCE_ID. It may already be terminated."
fi

# Release Elastic IP
echo "Releasing Elastic IP with Allocation ID: $ALLOC_ID..."
if aws ec2 release-address --allocation-id "$ALLOC_ID"; then
echo "Elastic IP with Allocation ID $ALLOC_ID has been successfully released."
else
echo "Warning: Failed to release Elastic IP with Allocation ID $ALLOC_ID. It may already be released."
fi

# Cleanup environment file
if [ -f "$ENV_FILE" ]; then
echo "Removing environment file $ENV_FILE..."
rm -f "$ENV_FILE"
fi

echo "Cleanup complete."
Loading
Loading