Algostat

⚠️ This repository is no longer maintained by Lukas Martinelli.

Tools to find the most frequently used C++ algorithms on Github.

Results

You can look at the results of 3869 analyzed C++ repos in my Google Spreadsheets or use the results.csv directly.

algorithm	sum	avg
swap	108363	28
find	81006	21
count	60306	16
move	57595	15
copy	48050	12
sort	33317	9
max	28848	7
equal	27467	7
min	21720	6
unique	18484	5
lower_bound	15017	4
remove	13972	4
replace	13262	3
upper_bound	11835	3
for_each	11518	3

##Usage

For best mode you should disable input and output buffering of Python.

export PYTHONUNBUFFERED=true

Analyze top C++ repos on Github

Analyze the top C++ repos on Github and create a CSV file.

./top-github-repos.py | ./algostat.py | ./create-csv.py > results.csv

Analyze all C++ repos on Github

Analyze all C++ repos listed in GHTorrent.

cat cpp_repos.txt | ./algostat.py | ./create-csv.py > results.csv

Distributed Analyzing with Redis Queue and workers

Use a Redis Queue to distribute jobs among workers and then fetch the results. You need to provide the ALGOSTAT_RQ environment variable to the process with the address of the redis server.

export ALGOSTAT_RQ_HOST="localhost"
export ALGOSTAT_RQ_PORT="6379"

Now you need to fill the job queue with results from top github repos and repos listed in GHTorrent and sort out the duplicates.

./top-github-repos.py >> jobs.txt
cat cpp_repos.txt >> jobs.txt
sort -u jobs.txt | ./enqueue-jobs.py

On your workers you need to tell algostat.py to fetch the jobs from a redis queue and then store it in a results queue.

./algostat.py --rq | ./enqueue-results.py

After that you aggregate the results in a single csv.

./fetch-results.py | ./create-csv.py > results.csv

Installation

Make sure you have Python 3 installed
Clone the repository
Install requirements with pip install -r requirements.txt

Using Docker for Deployment

You can use Docker to run the application in a distributed setup.

Redis

Run the redis server.

docker run --name redis -p 6379:6379 -d sameersbn/redis:latest

Get the IP address of your redis server. Assign it to the ALGOSTAT_RQ_HOST env variable for all following docker run commands. In this example we will work with 104.131.5.11.

Get the image

I have already setup an automated build lukasmartinelli/algostat which you can use.

docker pull lukasmartinelli/algostat

Or you can clone the repo and build the docker image yourself.

docker build -t lukasmartinelli/algostat .

Fill job queue

docker run -it --rm --name queue-filler \
-e ALGOSTAT_RQ_HOST=104.131.5.11 \
-e ALGOSTAT_RQ_PORT=6379 \
lukasmartinelli/algostat bash -c "cat cpp_repos.txt | ./enqueue-jobs.py"

Run the workers

Assign as many workers as you like.

docker run -it --rm --name worker1 \
-e ALGOSTAT_RQ_HOST=104.131.5.11 \
-e ALGOSTAT_RQ_PORT=6379 \
lukasmartinelli/algostat bash -c "./algostat.py --rq | ./enqueue-results.py"

Aggregate results

Note that this step is not repeatable. Once you've aggregated the results the redis list will be empty.

docker run -it --rm --name result-aggregator \
-e ALGOSTAT_RQ_HOST=104.131.5.11 \
-e ALGOSTAT_RQ_PORT=6379 \
lukasmartinelli/algostat bash -c "./fetch-results.py | ./create-csv.py"

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.gitignore		.gitignore
.travis.yml		.travis.yml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
algorithm.py		algorithm.py
algostat.py		algostat.py
cpp_repos.txt		cpp_repos.txt
create-csv.py		create-csv.py
enqueue-jobs.py		enqueue-jobs.py
enqueue-results.py		enqueue-results.py
fetch-results.py		fetch-results.py
repo.py		repo.py
requirements.txt		requirements.txt
results.csv		results.csv
rq.py		rq.py
test_algorithm.py		test_algorithm.py
top-github-repos.py		top-github-repos.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Algostat

Results

Analyze top C++ repos on Github

Analyze all C++ repos on Github

Distributed Analyzing with Redis Queue and workers

Installation

Using Docker for Deployment

Redis

Get the image

Fill job queue

Run the workers

Aggregate results

About

Releases

Packages

Languages

License

lukasmartinelli-alt/algostat

Folders and files

Latest commit

History

Repository files navigation

Algostat

Results

Analyze top C++ repos on Github

Analyze all C++ repos on Github

Distributed Analyzing with Redis Queue and workers

Installation

Using Docker for Deployment

Redis

Get the image

Fill job queue

Run the workers

Aggregate results

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages