-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Process media urls #101
Process media urls #101
Conversation
nft_ingester/src/json_downloader.rs
Outdated
@@ -19,6 +19,8 @@ pub struct JsonDownloader { | |||
pub metrics: Arc<JsonDownloaderMetricsConfig>, | |||
} | |||
|
|||
pub const MEDIA_FILES_EXTENSIONS: &[&str] = &["png", "jpeg", "gif", "mp3", "mov", "avi"]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some documentation that indicates all possible media formats for NFT? Or maybe some lib/crate with such info? Because this list is way far from all existing formats
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree, the thing is that people can upload there absolutely any file. That list contains formats we already saw at NFTs, so in future it may grow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe let's try to find some rust crate that contains a similar list but extended? I think it can be in some media-streaming-related lib
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't see the reason to add external dependency until we see that there is huge variety of media formats. Right now in postgre failed tasks only for png, jpeg and gif
nft_ingester/src/json_downloader.rs
Outdated
let lowercased_url = url.to_lowercase(); | ||
|
||
for extension in MEDIA_FILES_EXTENSIONS.iter() { | ||
if lowercased_url.ends_with(extension) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe trim tailing slashes before use ends_with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We take url from task and task from postgre, in postgre we write url which is already trimmed. The idea was that job of this func simply check it's ending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How should we treat the urls that are invalid or going nowhere, like https://nosite-at-all.not.even/a.site.gif
?
Will try to find such cases and compare with other providers. |
Maybe add '.' before every element in MEDIA_FILES_EXTENSIONS? Because url can be ended with some of this list element, if will not be an extension but just part of url hash. I mean to replace elements from "png" to ".png" etc |
What
This PR adds possibility for JSON downloader to process media urls without failing. Basically we just save media url to the DB without downloading anything. In Rocks we mark metadata as an empty string not to trigger API, there is check that smth is written to the offchain_data
UPD
Now we check header Content-Type instead of url