Skip to content

Latest commit

 

History

History
75 lines (52 loc) · 2.74 KB

README.md

File metadata and controls

75 lines (52 loc) · 2.74 KB

更新步骤

make fresh
python3 parse_dblp.py
YEAR=2018 python3 top_authors.py #近5年数据
YEAR=2020 python3 top_authors.py #近3年数据

由于ccf-a和清华-a的数据量太大 浏览器可能卡死 建议使用w3m转为txt

Publication statistics

This repository establishes simple statistics for a set of conferences.

Using the DBLP data set, we extract the top conferences and then aggregate them on per-author basis. Based on different sub groups (e.g., security, embedded systems, or OS) we then calculate per author statistics in a nice overview.

Processing happens in two stages:

  • parse_dblp.py extracts all publications and dumps them in a pickle files based on the per-area aggregation (this is slow as DBLP is a 3GB XML file). To be able to process such a large XML file, we use a stream processor that simply dumps interesting publications into Pub objects (see pubs.py).
  • top_authors.py leverages the pickle files to process per-area statistics and aggregate statistics.
  • author_cliques leverages the pickle files to calculate per-area author
  • cliques.

Using/Howto

  • Easy mode: check out the homepage
  • make all to download DBLP data, pickle, and create the html data
  • make fresh to update DBLP data and pickle it
  • make topauthors to create the top author pages
  • make cliques to create the cliques

Contributing

Ideas, comments, or improvements are welcome! Please reach out to Mathias Payer to discuss. You can also reach out to @gannimo on Twitter.

Changelog

  • 2021-02-09 fixed VLDB conference and added ICDE and PODS for the database community; added ASE and ISSTA for the software engineering community
  • 2021-01-11 added HPCA for architecture and adjusted paper length calculation for DAC
  • 2021-01-09 remove tutorials and short papers (by parsing pages data)
  • 2021-01-05 figures for overview page
  • 2021-01-04 new overview table across areas
  • 2021-01-02 added author cliques
  • 2020-12-30 first version with author statistics

Acknowledgements

This code and page was developed by Mathias Payer, initially over the 2020 holiday break. The site includes feedback and suggestions from too many to list, thank you for that!

We use information from DBLP and CSRankings for anti-aliasing of authors. The idea for the statistics was inspired by Davide's Software Security Circus.

License

All data in this repository is licensed under CC BY-NC-ND 4.0.