You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The links_images-field is very usable for reverse image search and showing thumbnails as part of a search result. Similar links_videos and maybe links_sounds would have equal benefits.
Unfortunately it is messy to extract as is has historically been hacked in different ways. Using iframe was popular at one point:
The problem here is that the only indication of the iframe containing a video and not an image, a HTML page or something else, is the URL for the video and that is in no way guaranteed to have a usable extension. Some ideas:
Only populate links_videos with "guaranteed" videos, i.e. those with known video extensions
Index all iframe#src and move the resolve logic to the GUI, first extracting all the URLs, then requesting their content_type_norm-field
If method 2 is used, it might be better to have a field links_resources with all inlined resources (except images). That would also catch sounds and make it possible to e.g. check is a page was iframed from somewhere.
The text was updated successfully, but these errors were encountered:
The
links_images
-field is very usable for reverse image search and showing thumbnails as part of a search result. Similarlinks_videos
and maybelinks_sounds
would have equal benefits.Unfortunately it is messy to extract as is has historically been hacked in different ways. Using
iframe
was popular at one point:The problem here is that the only indication of the
iframe
containing a video and not an image, a HTML page or something else, is the URL for the video and that is in no way guaranteed to have a usable extension. Some ideas:links_videos
with "guaranteed" videos, i.e. those with known video extensionsiframe#src
and move the resolve logic to the GUI, first extracting all the URLs, then requesting theircontent_type_norm
-fieldIf method 2 is used, it might be better to have a field
links_resources
with all inlined resources (except images). That would also catch sounds and make it possible to e.g. check is a page was iframed from somewhere.The text was updated successfully, but these errors were encountered: