Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues Previewing/exporting Instagram data collected with Zeeschuimer #387

Closed
leelum opened this issue Sep 7, 2023 · 3 comments
Closed

Comments

@leelum
Copy link

leelum commented Sep 7, 2023

Describe the bug
Upon importing a collection of Instagram data collected through Zeeschuimer, I have been unable to preview, nor export the data to CSV.

  • Trying to explore the data results in an Internal Server Error
  • Trying to download as CSV results in a failed download.
  • Attempting to use the Convert NDJSON file to CSV tool results in the conversion hanging, with the following log:

Thu Sep 7 10:49:59 2023: Processing 'Convert NDJSON file to CSV' started for dataset 7bbfdf83551737468f517850d178113b
Thu Sep 7 10:49:59 2023: Processing data
Thu Sep 7 10:50:01 2023: Converting file
Thu Sep 7 10:50:01 2023: Processor crashed ('NoneType' object is not subscriptable), trying again later

To Reproduce
Collect data from Instagram using Zeeschuimer, focusing predominantly on more than one accounts Instagram Reels via Instagram.com//reels.

4CAT Environment

  • Own server/desktop
  • If accessing via your own server/desktop, what is the environment and are you using Docker?: MacOS, using Docker Desktop.

Relevant error log from Docker:

2023-09-07 11:50:01 4cat_backend | 07-09-2023 10:50:01 | ERROR at search_instagram.py:200: Processor convert-ndjson-csv raised TypeError while processing dataset 7bbfdf83551737468f517850d178113b (via 18ece519fe48b0b80c835f4d386d5def) in ndjson_to_csv.py:52->dataset.py:348->search_instagram.py:64->search_instagram.py:200:
2023-09-07 11:50:01 4cat_backend | 'NoneType' object is not subscriptable
2023-09-07 11:50:01 4cat_backend |
2023-09-07 11:50:02 4cat_backend | 07-09-2023 10:50:02 | INFO at processor.py:159: Running processor convert-ndjson-csv on dataset 59a4884ac3ca752755f85bac47c0cd4d
2023-09-07 11:50:04 4cat_backend | 07-09-2023 10:50:04 | ERROR at search_instagram.py:200: Processor convert-ndjson-csv raised TypeError while processing dataset 59a4884ac3ca752755f85bac47c0cd4d (via 18ece519fe48b0b80c835f4d386d5def) in ndjson_to_csv.py:52->dataset.py:348->search_instagram.py:64->search_instagram.py:200:
2023-09-07 11:50:04 4cat_backend | 'NoneType' object is not subscriptable
2023-09-07 11:50:04 4cat_backend |

@dale-wahl
Copy link
Member

Odd, that looks like ZeeSchuimer collected a post without a username and 4CAT is failing since it expects every post to have one.

We are working on a fix that will skip bad items like that, update the log to let you know, and allow you to at least preview the data. For the moment you can download the NDJSON file to view the collected data (you could even find the invalid post), but it is a bit more complicated to fix it in a way 4CAT could still run processors until we are able to deploy the previously mentioned fix.

As to why a post has no username, that's something Zeeschuimer will need to address. I just made an issue there. I think it might help if you posted the NDJSON (if it is not too large) or, better yet, just the offending post if possible.

@leelum
Copy link
Author

leelum commented Sep 7, 2023

Brill - thanks for looking into this!

@dale-wahl
Copy link
Member

1101a0a fixes this. Now 4CAT will skip items that do not map correctly (such as this example of a post without a username) and notify the user (and update the dataset log) as well as the administrator of 4CAT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants