Support Django 4.0 and python 3.10; Support import of "new" Twitter archive downloads #231

philgyford · 2022-02-14T11:44:15Z

Add support for Django 4.0
Ensure support for python 3.10
Drop support for Django 2.2 and 3.1
Require django-taggit v2.0.0
Add support for importing the Twitter archive format that was introduced sometime in early 2019. Retain support for the previous format with an argument on the import_tweets command. The updated command, with the new archive format, also imports the media files that the import includes.

It *should* probably all work once django-taggit releases a version that supports Django 4.0 and python 3.10. See jazzband/django-taggit#776 For #223

Remembered that I removed it because it won't work with django-taggit 2.0 that is needed for Django 4.0.

Really this time.

No, the sequence doesn't go 3.8, 3.9, 4.0...

In preparation for importing the "new" 2019 format of Twitter archives. For #229

So that we can keep it working, for those with older existing Twitter Archive downloads, while adding a newer ingester for those with 2019+ downloads. For #229

Some tweet JSON have the `display_text_range` set as strings not ints, e.g. `["0", "140"]` rather than `[0, 140]`. Particularly when the JSON has come from the downloaded Twitter archive. And Some tweet JSON have the `["entities"][<kind>]["indices"]` set as strings not ints, e.g. `["0", "9"]` rather than `[0, 9]`. Particularly when the JSON has come from the downloaded Twitter archive. For #229

* Added the `Version2TweetIngester` class which collates the twitter user data from three separate archive files, and the tweet data from the single large `tweet.js` file, and passes all that to the saver. We add a note to the user data to make it clear - when it's saved to the database as "raw" data - that it was compiled by this code, and doesn't come directly from the API/archive. * Adjusted the `TweetSaver` class so that it can be passed data about a twitter user separately - the API, and the previous twitter archive, included the user data within each tweet's data. But now, presumably to save space, the individual tweets' JSON don't include the user data. So we now pass the `TweetSaver` the user data as a separate object. * Added tests for the `Version2TwetIngester`. Still to do: * Waiting for an archive of a private twitter account, in order to see what the structure of the `protected-history.js` file is like, so that we can correctly set the privacy status of the account. * Given the 2019+ archive includes media files for the tweets, we may as well import all those in the Ingester as `Media` objects. For #229

* The downloaded archive includes all the media files associated with a user's tweets, so we can import them relatively easily. * We import the MP4s Twitter users to display animated GIFs and the image files for JPGs/PNGs. We don't import video files that were uploaded as such because we don't currently include those when fetching media files from the API, so this is to remain consistent. * When we fetch media files for animated GIFs, we fetch both the MP4 and a JPG of it. Although we have the path for both in the tweet data in the archive, only the MP4 is present in the `tweet_media` directory so we only import that. For #229

To be in sync with what they're called on Twitter these days. And change link to Twitter developer portal.

I'm sure this isn't an ideal way, but it's better than not doing anything which is what was happening before.

…data (a) because we can't get it from the downloaded archive https://twittercommunity.com/t/download-archive-does-not-include-current-protected-status/166622 and (b) because that value should be set when saving the Account object, which fetches the User data from Twitter before the import. For #229

* To account for new procedures for setting up a Twitter App and applying for Extended Permissions, which are required to access the v1.1 API * And to document the two versions of the import management command. For #229

* Make passing "private" in when saving Twitter User data optional (because it's not in the downloaded archive of data) * Cope with the fact some idiot (me) decided to pass either a dict *or* a boolean from a method. For #229

For #229

coveralls · 2022-02-14T11:46:36Z

Coverage decreased (-0.2%) to 93.867% when pulling 6aca66d on v2 into 591bdce on main.

philgyford added 22 commits December 15, 2021 15:06

Start adding support for Django 4.0 and python 3.10

34fbce6

It *should* probably all work once django-taggit releases a version that supports Django 4.0 and python 3.10. See jazzband/django-taggit#776 For #223

Update python dependencies

a4a7155

Add Django 2.2 back into supported versions for a while

4184b55

Remove Django 2.2 again.

64526cf

Remembered that I removed it because it won't work with django-taggit 2.0 that is needed for Django 4.0.

Update devproject's dependencies

ce88835

OK, dropping support for Django 2.2 and 3.1.

7cb72d3

Really this time.

Fix python version in GitHub Action test workflow

d21e586

No, the sequence doesn't go 3.8, 3.9, 4.0...

Move test_ingest.py to test_ingest_v1.py

1eb5445

In preparation for importing the "new" 2019 format of Twitter archives. For #229

Make existing TweetIngester Version1TweetIngester

81d3c0a

So that we can keep it working, for those with older existing Twitter Archive downloads, while adding a newer ingester for those with 2019+ downloads. For #229

Formatting

ba96bc5

Alter versbose names of Twitter API key fields

2eeec0a

To be in sync with what they're called on Twitter these days. And change link to Twitter developer portal.

Add a bit of logging for when an API call fails

8015283

I'm sure this isn't an ideal way, but it's better than not doing anything which is what was happening before.

Update Twitter documentation

dd90c1f

* To account for new procedures for setting up a Twitter App and applying for Extended Permissions, which are required to access the v1.1 API * And to document the two versions of the import management command. For #229

Fix errors found after running tests

ce22222

* Make passing "private" in when saving Twitter User data optional (because it's not in the downloaded archive of data) * Cope with the fact some idiot (me) decided to pass either a dict *or* a boolean from a method. For #229

Update name of changelog

b71dc44

Update for v2.0.0 release

6f6e0aa

Rebuild documentation for v2.0.0

c7cd4ef

Please the linter

6aca66d

For #229

philgyford merged commit fce848f into main Feb 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Django 4.0 and python 3.10; Support import of "new" Twitter archive downloads #231

Support Django 4.0 and python 3.10; Support import of "new" Twitter archive downloads #231

philgyford commented Feb 14, 2022

coveralls commented Feb 14, 2022

Support Django 4.0 and python 3.10; Support import of "new" Twitter archive downloads #231

Support Django 4.0 and python 3.10; Support import of "new" Twitter archive downloads #231

Conversation

philgyford commented Feb 14, 2022

coveralls commented Feb 14, 2022