You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now, getmedia.py downloads all media files that it can. But with retweets and resharing images and other forms of media, we might not need to keep all copies of media files.
Instead, we might want to create unique identifiers for all unique media files, then link those unique identifiers with each tweet in which that media appears.
If that's the case, we might hash media files to get a bit-level representation of unique files; store all unique hashes somewhere (perhaps alongside the main data collection, like we do with stream limit messages); compare new media file hashes against the list of existing media file hashes; and only retain unique media files.
The text was updated successfully, but these errors were encountered:
Right now, getmedia.py downloads all media files that it can. But with retweets and resharing images and other forms of media, we might not need to keep all copies of media files.
Instead, we might want to create unique identifiers for all unique media files, then link those unique identifiers with each tweet in which that media appears.
If that's the case, we might hash media files to get a bit-level representation of unique files; store all unique hashes somewhere (perhaps alongside the main data collection, like we do with stream limit messages); compare new media file hashes against the list of existing media file hashes; and only retain unique media files.
The text was updated successfully, but these errors were encountered: