make fresh
python3 parse_dblp.py
YEAR=2018 python3 top_authors.py #近5年数据
YEAR=2020 python3 top_authors.py #近3年数据
由于ccf-a和清华-a的数据量太大 浏览器可能卡死 建议使用w3m转为txt
This repository establishes simple statistics for a set of conferences.
Using the DBLP data set, we extract the top conferences and then aggregate them on per-author basis. Based on different sub groups (e.g., security, embedded systems, or OS) we then calculate per author statistics in a nice overview.
Processing happens in two stages:
parse_dblp.py
extracts all publications and dumps them in a pickle files based on the per-area aggregation (this is slow as DBLP is a 3GB XML file). To be able to process such a large XML file, we use a stream processor that simply dumps interesting publications intoPub
objects (seepubs.py
).top_authors.py
leverages the pickle files to process per-area statistics and aggregate statistics.author_cliques
leverages the pickle files to calculate per-area author- cliques.
- Easy mode: check out the homepage
make all
to download DBLP data, pickle, and create the html datamake fresh
to update DBLP data and pickle itmake topauthors
to create the top author pagesmake cliques
to create the cliques
Ideas, comments, or improvements are welcome! Please reach out to Mathias Payer to discuss. You can also reach out to @gannimo on Twitter.
- 2021-02-09 fixed VLDB conference and added ICDE and PODS for the database community; added ASE and ISSTA for the software engineering community
- 2021-01-11 added HPCA for architecture and adjusted paper length calculation for DAC
- 2021-01-09 remove tutorials and short papers (by parsing pages data)
- 2021-01-05 figures for overview page
- 2021-01-04 new overview table across areas
- 2021-01-02 added author cliques
- 2020-12-30 first version with author statistics
This code and page was developed by Mathias Payer, initially over the 2020 holiday break. The site includes feedback and suggestions from too many to list, thank you for that!
We use information from DBLP and CSRankings for anti-aliasing of authors. The idea for the statistics was inspired by Davide's Software Security Circus.
All data in this repository is licensed under CC BY-NC-ND 4.0.