-
Notifications
You must be signed in to change notification settings - Fork 633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix low-res images in Medium pages that uses lazy loading #588
Conversation
Thanks for the patch, this looks interesting! I'm a bit confused because the yahoo testcase seems to miss images in your screenshot? I don't mind the small markup changes, but I'd prefer to see the diffs of the resulting |
@gijsk I've updated the test for those pages.
No, it just Puppeeter being weird when taking screenshot. If you view it in browser normally, the Youtube image is still there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really interesting! I left some comments, mostly relating to some grammar/spelling, but there's a few concrete issues:
- this won't improve which image we use everywhere - though I think we can take the patch, it'd be nice if we kept the source of the image we dropped, so that consumers of the readability markup could choose to e.g. do a HEAD request on both images to determine their size, or apply some heuristic to the URL or something.
- there are some tricky issues with looping over elements and removing them...
<p>Get fast and easy access to the features you use most in the new menu. Open the “Customize” panel to add, move or remove any button you want. Keep your favorite features — add-ons, private browsing, Sync and more — one quick click away.</p> | ||
<p><img src="http://mozorg.cdn.mozilla.net/media/img/firefox/desktop/customize/designed-redesigned-high-res.6efd60766484.png" data-processed="false" data-src="//mozorg.cdn.mozilla.net/media/img/firefox/desktop/customize/designed-redesigned.fbd3ee9402e6.png" data-high-res="true" data-high-res-src="//mozorg.cdn.mozilla.net/media/img/firefox/desktop/customize/designed-redesigned-high-res.6efd60766484.png" alt="" id="designed-mobile" /> </p> | ||
<p><img src="http://mozorg.cdn.mozilla.net/media/img/firefox/desktop/customize/designed-redesigned.fbd3ee9402e6.png" alt="" id="designed-mobile" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it's hard to know what would happen here today, because the original images no longer load and the page has disappeared off google, but the file names here indicate that we're now loading the mobile version of the image instead of the high-res one. That's the opposite of what you wanted to accomplish, right?
I'm not sure there's a way to really solve both cases, short of trying to come up with heuristics to determine which URL is better - or creating some kind of pluggability where the consumer of the library can fetch both and determine which they want... which seems tricky, because it more or less entails starting to use await
and promises in the library, which is its own project.
Could we keep the old/other source available on the image in a separate attribute, so that consumers of the resulting markup could still make their own decisions later? That might be an easier way of giving control to the library consumers without needing too many architectural changes inside readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in d32f39f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit puzzled because the image for which this conversation was started still only has 1 attribute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this conflicts with the code from #542 in _fixLazyImages
, because your patch is replacing the original image with the one from inside the noscript
element, without preserving the data-src
etc. attributes.
I'll add a suggestion on how to fix this.
Apologies for the delay here; it's code freeze crunch time and I'm stretched a little thin at the moment. I hope to be able to look at this later today, worst case tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies again for the slow review. I have one more nitpick to really preserve high-res images here, and then I think we're good.
<p>Get fast and easy access to the features you use most in the new menu. Open the “Customize” panel to add, move or remove any button you want. Keep your favorite features — add-ons, private browsing, Sync and more — one quick click away.</p> | ||
<p><img src="http://mozorg.cdn.mozilla.net/media/img/firefox/desktop/customize/designed-redesigned-high-res.6efd60766484.png" data-processed="false" data-src="//mozorg.cdn.mozilla.net/media/img/firefox/desktop/customize/designed-redesigned.fbd3ee9402e6.png" data-high-res="true" data-high-res-src="//mozorg.cdn.mozilla.net/media/img/firefox/desktop/customize/designed-redesigned-high-res.6efd60766484.png" alt="" id="designed-mobile" /> </p> | ||
<p><img src="http://mozorg.cdn.mozilla.net/media/img/firefox/desktop/customize/designed-redesigned.fbd3ee9402e6.png" alt="" id="designed-mobile" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit puzzled because the image for which this conversation was started still only has 1 attribute?
<p>Get fast and easy access to the features you use most in the new menu. Open the “Customize” panel to add, move or remove any button you want. Keep your favorite features — add-ons, private browsing, Sync and more — one quick click away.</p> | ||
<p><img src="http://mozorg.cdn.mozilla.net/media/img/firefox/desktop/customize/designed-redesigned-high-res.6efd60766484.png" data-processed="false" data-src="//mozorg.cdn.mozilla.net/media/img/firefox/desktop/customize/designed-redesigned.fbd3ee9402e6.png" data-high-res="true" data-high-res-src="//mozorg.cdn.mozilla.net/media/img/firefox/desktop/customize/designed-redesigned-high-res.6efd60766484.png" alt="" id="designed-mobile" /> </p> | ||
<p><img src="http://mozorg.cdn.mozilla.net/media/img/firefox/desktop/customize/designed-redesigned.fbd3ee9402e6.png" alt="" id="designed-mobile" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this conflicts with the code from #542 in _fixLazyImages
, because your patch is replacing the original image with the one from inside the noscript
element, without preserving the data-src
etc. attributes.
I'll add a suggestion on how to fix this.
Looks great, thank you! |
I'm having a bit of trouble getting this to work as expected; when I run it against a Medium page, the output includes both the low-res and the high-res images. If I regenerate the lazy-image-1 testcase, I get a different source file than what's in this PR, so it's possible that Medium's changed their site in the past few days: @RadhiFadlillah I'll try to look into fixing this soon if you don't have time to. |
Several months ago, an issue is submitted to my project that uses
go-readability
, the port ofreadability.js
for Go. I think I've fixed it there, so might as well put it here as well.The summary of that issue is, Medium pages lazy load the image using JavaScript. However, they also provide the high quality image inside
noscript
tag in case the user disabled the JS in their browser. Unfortunately, right now Readability removes allnoscript
tags without exception.By the way, this issue is only happened when Readability is used in server, and works fine in Firefox reader mode.
To fix this issue, I decide to unwrap
img
from insidenoscript
if it fulfills two criteria :noscript
isimg
; andnoscript
only contains exactly oneimg
element.To prevent wrong unwrapping, I also did these steps before unwrapping the
noscript
:div
that contains exactly one image, then put it out outside thediv
. This is done toprevent case where a
noscript
has direct previous sibling of adiv
that doesn't have any use except containing a singleimg
.img
that doesn't have any sources. This is done, because in Medium they use emptyimg
as placeholder. If it doesn't removed, after unwrapping we will ended up with two image, both the low quality and high quality image.For sample, I use this Medium article and here is the result :
Unfortunately, the test is failed for 5 pages :
However, from those failed tests, all changes is only happened in its DOM structure, yet visually there are (almost) no change as can be seen below :
So, what do you think ? If you think it's fine, I will update the expected result for those tests later.