I have many problems with twint. I decided to stop developing the library. If you liked my solution, maybe you will be interested in my library - https://github.com/markowanga/stweet.
Sometimes there is a need to scrap many enormous tweet data in short time. This project help to do this task. Solution is based on Twint — popular tool to scrap twitter data.
- Prepare architecture of microservices, which is scalable and can be distributed for many machines
- Divide single scrap tasks for small task
- Support that wne worker have error and the elementary task can be repeated on other instance
- Workaround twitter limit, which disallow to download many data from one ip address
- All data are gathered into one location
- Use docker whenever possible
- User add commands to scrap by HTTP request
- As a request result, server add commands to RabbitMQ for scrap data, the time bounds can be divided for small intervals
- Workers get the messages from RabbitMQ to scrap data — they do this job
- When elementary task has been finished the data is upload to server
- Server save all received data to central storage