Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IQSS/9506 thumbnail failure tracking and other performance improvements #9669

Merged

Conversation

qqmyers
Copy link
Member

@qqmyers qqmyers commented Jun 21, 2023

What this PR does / why we need it: This PR makes several changes to improve the performance of thumbnail generation and retrieval, covering the issue #9506 (also reported via QDR) and the issue raised by Bikramjit Singh/Borealis when using relatively slow S3 storage (email between Borealis and @scolapasta, @pdurbin and @qqmyers ) and additional issues discovered while investigating:

  • Caches the isThumbnailAvailable response in the ThumbnailServiceWrapper for the dataset file table in edit/view modes
  • Switches to returning a download URL (versus a base64-encoded copy) in the main dataset search display
  • Sets the dataset id in datasets returned in search results to enable the existing caching in the ThumbnailServiceWrapper.dvobjectThumbnailsMap (the lack of an id meant the caching map wasn't being populated)
  • Implements a previewshavefailed previewimagefail flag that is set the first time an attempt to create a thumbnail for a given file fails which is then used to avoid retrying the thumbnail creation process every time a thumbnail is requested (or isThumbnailAvailable() was called). Adds api calls to reset this flag globally or per file (to allow retrying)
  • Switches to using streams (vs channels) in pdf thumb generation for the temp file case since the channel.transferFrom method is not guaranteed to transfer all bytes (and can transfer 0 bytes) whereas the InputStream.transferTo method blocks until all bytes are transferred.
  • Refactors to remove duplicate code
  • Sets the preview available flag false when the attempt to copy temporary previews during Ingest fails

Which issue(s) this PR closes:

Special notes for your reviewer:
This PR doesn't completely remove using a base64 encoded thumb URL, e.g. on the dataset and file page where one base64 image is displayed. Once could also remove it in that case, but the performance issues related to base64 generation are primarily when many images have to be created before a page can be rendered.

This may also be useful for the SPA?

Suggestions on how to test this: Assure that thumbnails appear for image/pdf files as before (regression), that initial page load is faster/DV server is not making multiple S3 calls from the server to render initial root collection page, that files where thumbnail generation fails get marked with the previewshavefailed flag and subsequent accesses don't attempt to recreate the thumbnail, that using the api call to reset the flag for a/all files results in a new attempt to create a thumbnail.

Does this PR introduce a user interface change? If mockups are available, please link/include them here: thumbnails on the main page now load ~asynchronously (versus the page not loading at all until all thumbs are available).

Is there a release notes update needed for this change?:

Additional documentation: admin API docs

@qqmyers qqmyers added the Size: 30 A percentage of a sprint. 21 hours. (formerly size:33) label Jun 21, 2023
@coveralls
Copy link

coveralls commented Jun 21, 2023

Coverage Status

coverage: 20.063% (+0.003%) from 20.06%
when pulling 5149941 on QualitativeDataRepository:IQSS/9506-thumbnail-tracking
into b33fe57 on IQSS:develop.

@bikramj
Copy link
Contributor

bikramj commented Jun 21, 2023

Thank you so much @qqmyers for implementing it, this will solve the slow page load issue for us and anyone using custom S3 endpoints.

Copy link
Contributor

@sekmiller sekmiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. thanks for the updates.

@sekmiller sekmiller removed their assignment Nov 28, 2023
@pdurbin pdurbin self-assigned this Nov 30, 2023
qqmyers added a commit to QualitativeDataRepository/dataverse that referenced this pull request Dec 5, 2023
FWIW: QDR generates a 400px version here and then uses styling
 to fit the page. Not sure what the motivation for that was without
 digging.
@qqmyers qqmyers removed their assignment Dec 5, 2023
@pdurbin
Copy link
Member

pdurbin commented Dec 5, 2023

There are merge conflicts and the SQL script needs to be bumped.

qqmyers and others added 3 commits December 5, 2023 14:30
Conflicts (easy, just "add both"):
doc/sphinx-guides/source/api/changelog.rst
doc/sphinx-guides/source/api/native-api.rst
src/main/java/edu/harvard/iq/dataverse/api/Admin.java
@pdurbin pdurbin merged commit e3e122a into IQSS:develop Dec 5, 2023
11 of 12 checks passed
@pdurbin
Copy link
Member

pdurbin commented Dec 5, 2023

Found a regression, which Jim fixed (thank you!). I confirmed that I can still generate thumbnails for images and PDFs. For PDFs I had to install ImageMagick. See also this issue:

@qqmyers qqmyers deleted the IQSS/9506-thumbnail-tracking branch May 17, 2024 18:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Performance & Stability GDCC: Borealis of interest to Borealis GDCC: QDR of interest to QDR Size: 30 A percentage of a sprint. 21 hours. (formerly size:33)
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

Add a "no thumbnail" flag to mark problematic images (to avoid extra generation attempts)
5 participants