Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingest Ladybird checksums into DCS for validation purposes #2963

Closed
3 tasks done
sshetenhelm opened this issue Nov 5, 2024 · 18 comments
Closed
3 tasks done

Ingest Ladybird checksums into DCS for validation purposes #2963

sshetenhelm opened this issue Nov 5, 2024 · 18 comments
Assignees

Comments

@sshetenhelm
Copy link

sshetenhelm commented Nov 5, 2024

Story
Prior to completing the work in #2962, we will need to ingest the Ladybird image checksums into DCS storage.

Acceptance

  • Create logic for type of checksum
  • Ingest Ladybird checksums into DCS for validation jobs
  • Rake task to update checksum values
@martinlovell
Copy link
Collaborator

martinlovell commented Nov 8, 2024

All checksum from Ladybird are in /data/10 on the management-workers.

The files are tsv format:
e.g.

15473151	9b79f8d6d5ee9a2da7855c0334eaddcf	null	pctybr_inv_4641_ro.tif	42532396	PRIMARY
15473154	6e531052accb00a39f1e5a83f8cadbb1	null	pctybr_inv_4705_ro.tif	68008516	PRIMARY
15473148	6bc58a223cac0d72a6e392dab7f89fda	null	pctybr_inv_4628_ro.tif	69626456	PRIMARY

Child OID, MD5, Sha256, filename, filesize, label

You can ignore filename and label.

The filenames are collection_#.tsv. e.g. collection_1.tsv. (1-27)

@martinlovell
Copy link
Collaborator

@K8Sewell
Copy link

Deployed to Test with release v2.72.9

@K8Sewell
Copy link

Tried to run rake task in the Test container but I am not sure it worked. I saw no change in the number of child objects without a checksum so I believe I need the files_glob but I don't think I have access to acquire it so will need to ask for that file.

Image

@martinlovell
Copy link
Collaborator

martinlovell commented Nov 20, 2024

Do any of them have a sha256_checksum or md5_checksum?
I ran it last week.

rake child_objects:load_ladybird_checksums['/data/10/collection*tsv']

@K8Sewell
Copy link

K8Sewell commented Nov 21, 2024

Taking back to In Progress.

Next Steps post pairing Martin and Kait:

rake task:

  • make more efficient by batching removed from acceptance because this is a one time rake task
  • add logs
  • add instructions to wiki for where to find the files_glob also removed because this is a one time task and notes for how to run the task are available in this ticket

integrity check:

  • make ticket to decide and then implement process for when incorrect checksum data is retrieved from ladybird

decide which checksum attribute will be 'source of truth':

  • options: checksum, sha512_checksum, sha256_checksum, md5_checksum

@K8Sewell K8Sewell self-assigned this Nov 21, 2024
@K8Sewell
Copy link

Run rask task in UAT on Tuesday, November 26 at end of day, 5 pm ET.

@K8Sewell
Copy link

Ticket to run rake task on prod - #2973

@K8Sewell
Copy link

PR ready for review - yalelibrary/yul-dc-management#1459

@K8Sewell
Copy link

K8Sewell commented Dec 2, 2024

Deployed to Test with release v2.73.1

@K8Sewell
Copy link

K8Sewell commented Dec 6, 2024

PR ready for review - yalelibrary/yul-dc-management#1461

@K8Sewell
Copy link

K8Sewell commented Dec 6, 2024

Deployed to Test with release v2.73.4

@K8Sewell
Copy link

K8Sewell commented Dec 6, 2024

working on Test. Will promote to UAT.

@K8Sewell
Copy link

K8Sewell commented Dec 9, 2024

Are the data shares at yale set up the same way on UAT as the are on Test? Asking if I need to update where the rake task thinks the tsv files to be.

@martinlovell
Copy link
Collaborator

They are the same for test and uat/prod. So, run the rake task the same way. (Same path to files)

@martinlovell
Copy link
Collaborator

martinlovell commented Dec 16, 2024

Running on UAT with

nohup rake child_objects:load_ladybird_checksums['/data/10/collection*tsv'] > /data/10/ladybird_checksum_ingest.log &

@martinlovell
Copy link
Collaborator

The workers were restared at December 16, 2024 at 14:20.
The last log entry was 2024-12-16 19:22:57.654086 I [232:6200] Rails -- Number of processed children: 1465622
So, I restarted it, logging to /data/10/ladybird_checksum_ingest_uat_2024-17.log

nohup rake child_objects:load_ladybird_checksums['/data/10/collection*tsv'] > /data/10/ladybird_checksum_ingest_uat_2024-17.log &

@martinlovell
Copy link
Collaborator

It finished! Logs are in /data/10/ladybird_checksum_ingest_uat_2024-17.log

@jillpe jillpe closed this as completed Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants