Twint-Distributed

No long supported

I have many problems with twint. I decided to stop developing the library. If you liked my solution, maybe you will be interested in my library - https://github.com/markowanga/stweet.

Description

Sometimes there is a need to scrap many enormous tweet data in short time. This project help to do this task. Solution is based on Twint — popular tool to scrap twitter data.

Main concepts

Prepare architecture of microservices, which is scalable and can be distributed for many machines
Divide single scrap tasks for small task
Support that wne worker have error and the elementary task can be repeated on other instance
Workaround twitter limit, which disallow to download many data from one ip address
All data are gathered into one location
Use docker whenever possible

How it works

User add commands to scrap by HTTP request
As a request result, server add commands to RabbitMQ for scrap data, the time bounds can be divided for small intervals
Workers get the messages from RabbitMQ to scrap data — they do this job
When elementary task has been finished the data is upload to server
Server save all received data to central storage

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
assets		assets
ci		ci
client		client
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
rabbit.conf		rabbit.conf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twint-Distributed

No long supported

Description

Main concepts

How it works

About

Releases

Packages

Languages

markowanga/twint-distributed

Folders and files

Latest commit

History

Repository files navigation

Twint-Distributed

No long supported

Description

Main concepts

How it works

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages