Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low-res images for sites that use progressive enhancement? #230

Closed
neezer opened this issue Jan 21, 2020 · 9 comments
Closed

Low-res images for sites that use progressive enhancement? #230

neezer opened this issue Jan 21, 2020 · 9 comments

Comments

@neezer
Copy link

neezer commented Jan 21, 2020

I just tried adding a bookmark for a Medium article and noticed the images were imported into Shiori at an atrocious quality:

Screen Shot 2020-01-20 at 8 58 11 PM

I'm guessing this is because Medium will lazy-load the higher-resolution copies with JS, but the Shiori importer doesn't wait around for that. That's my best guess anyways. Inspecting the Medium page source, I see that the images have a noscript tag near 'em with he full-quality version of the image... perhaps that could be useful when importing?

Think this is fixable?

@neezer
Copy link
Author

neezer commented Jan 21, 2020

This was the article I was importing, if it's helpful for testing: https://medium.com/voodoo-engineering/node-js-and-cpu-profiling-on-production-in-real-time-without-downtime-d6e62af173e2

@neezer
Copy link
Author

neezer commented Jan 21, 2020

Definitely two different URLs:

- https://miro.medium.com/max/60/1*87KlGgfbuWP38nAaQaj3xw.png?q=20
+ https://miro.medium.com/max/1860/1*87KlGgfbuWP38nAaQaj3xw.png

The former is the URL Shiori pulls; the latter is the URL in the fully-loaded Medium article, and also the URL found in the adjacent noscript tags. In my testing, the second URL parameter is the deciding factor; the query parameter q does not seem to make a significant change, and is missing entirely when the page fully loads and the JS executes on a given image.

This problem is further compounded if you have attempted to Archive the page in Shiori; at present the images don't load at all:

Screen Shot 2020-01-20 at 9 23 57 PM


Seems like this would work fine if Shiori pulled the noscript value instead of the given img value, but I'm unsure if that's safe/wise to do categorically.

Right now I'm contemplating manually massaging the imported HTML in SQLite to the correct URLs, but that's obviously pretty labor intensive and not something I'd like to do routinely. However, I've also noticed the embedded code examples didn't import at all, so I might have to do that anyways, as I want to have those archived too.

@8bitgentleman
Copy link

I would also love to see a fix for medium articles, as that's one of the more common sites I use

@RadhiFadlillah
Copy link
Collaborator

@neezer @8bitgentleman sorry for late reply.

Just want to tell you the fix for this issue has been implemented in go-readability.

However, it might take a while to merge it to Shiori because I also want to improve the archival method to make it better, at least to make Shiori able to archive pages from Github and its gist.

@fmartingr
Copy link
Member

Hey everyone, I've tested this and it's currently working on the latest version:

Screenshot 2022-02-06 at 17 05 25

I'm closing this as solved, but if you have any other issues please comment again so we can reopen.

@rundx
Copy link

rundx commented Feb 24, 2022

Definitely two different URLs:

- https://miro.medium.com/max/60/1*87KlGgfbuWP38nAaQaj3xw.png?q=20
+ https://miro.medium.com/max/1860/1*87KlGgfbuWP38nAaQaj3xw.png

The former is the URL Shiori pulls; the latter is the URL in the fully-loaded Medium article, and also the URL found in the adjacent noscript tags. In my testing, the second URL parameter is the deciding factor; the query parameter q does not seem to make a significant change, and is missing entirely when the page fully loads and the JS executes on a given image.

This problem is further compounded if you have attempted to Archive the page in Shiori; at present the images don't load at all:

Screen Shot 2020-01-20 at 9 23 57 PM

Seems like this would work fine if Shiori pulled the noscript value instead of the given img value, but I'm unsure if that's safe/wise to do categorically.

Right now I'm contemplating manually massaging the imported HTML in SQLite to the correct URLs, but that's obviously pretty labor intensive and not something I'd like to do routinely. However, I've also noticed the embedded code examples didn't import at all, so I might have to do that anyways, as I want to have those archived too.

This is still an issue when loading archived version of medium articles

@fmartingr
Copy link
Member

This is still an issue when loading archived version of medium articles

Have you tried updating the cache and see if the new achived version has been correctly downloaded?

@rundx
Copy link

rundx commented Feb 26, 2022

This is still an issue when loading archived version of medium articles

Have you tried updating the cache and see if the new achived version has been correctly downloaded?

Yes, still the same

Screen Shot 2022-02-26 at 12 35 52 PM

@fmartingr
Copy link
Member

I believe we have been talking about two different things here. In this issue we're talking about the content view of the article (which comes from go-readability) and your problem comes from the archived version which right know comes from warc. Warc is not maintained anymore and we need to migrate to obelisk (#353).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants