-
-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement downloading archived item + QA runs as multi-WACZ #1933
Conversation
Not working yet - doesn't trigger download
This commit removes some safety checks on the btrix-dropdown containing the menu items. In Brave, Chrome, and Firefox, the behavior that the checks were supposed to guard against aren't happening after the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! We could stand to remove the other preventDefault
calls on click events on overflow dropdowns in list views too, since I think now that we've migrated to the new tables we're using links that don't cover the dropdown menu any more, but that could be a cleanup item I suppose.
frontend/src/pages/org/archived-item-detail/archived-item-detail.ts
Outdated
Show resolved
Hide resolved
…il.ts Co-authored-by: Henry Wilkinson <[email protected]>
Yeah I think maybe that's best as a separate cleanup task? Thank you! |
frontend/src/pages/org/archived-item-detail/archived-item-detail.ts
Outdated
Show resolved
Hide resolved
…il.ts Co-authored-by: sua yoo <[email protected]>
@SuaYoo thanks for catching the typo! |
- add test for endpoint - use new endpoint in frontend
Also added download of QA run data as multi-WACZ as well, as now we are downloading only the first WACZ |
Thanks for this! |
@SuaYoo mind taking another look at this when you have a moment? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UI and other changes look good - (and hopefully the QA downloads s well).
Investigating potential crc32 mismatches in the generated WACZs, though I think that may have to do with the streaming zip library we're using and not related to the changes in this PR itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested the UI, looks good, haven't tested the feature myself. Other issues are best dealt with by other folks, approving my end of things! :)
- update to a new version 'stream-zip' which does not require crc32 to produce a stream WACZ - don't specify crc32 in streaming WACZ as it may be incorrect or missing, compute on the fly - update types for latest 'stream-zip'
I believe this should now be fixed, with a new custom fork of stream-zip! Crawler was actually computing crc32 incorrectly, with latest fix, not requiring crc32 at all |
One other thing to consider: what do we want the default names of the WACZs to be? Should it just be the crawl id, or perhaps the workflow name to be more user-friendly? (Can also revisit later) |
I think crawl id is okay for now? It's similar to the WACZ filenames in the Files list (with prefix and without the instance suffix). With the workflow name it might get a bit long? |
- remove unused sync functions - use async methods from stream-zip - note that stream-zip still does a sync->async conversion under the hood - follow-up to #1933 for streaming download improvements
- remove unused sync functions - use async methods from stream-zip - note that stream-zip still does a sync->async conversion under the hood - follow-up to #1933 for streaming download improvements
…s: (#1982) - download via presigned URLs via requests instead of boto APIs, remove boto - follow-up to #1933 for streaming download improvements - fixes datapackage.json in multi-wacz to contain the same resources objects with: `name`, `path`, `hash`, `bytes` to match single WACZ. - Add additional metadata to multi-wacz datapackage.json, including `type` (`crawl`, `upload`, `collection`, `qaRun`), `id` (unique id for the object), `title` / `description` if available (for crawl/upload/collection), and `crawlId` for `qaRun`
Fixes #1412
Changes
Backend
all-crawls
,crawls
, anduploads
API endpoints to download archived item as multi-WACZFrontend
Adds ability to download archived item from:
Screenshots
Files tab
Detail actions menu
Archived items list menu