DOI | Artifact |
---|---|
Java corpus | https://doi.org/10.7488/ds/1690 |
C corpus | https://doi.org/10.5281/zenodo.3628775 |
Python corpus | https://doi.org/10.5281/zenodo.3628784 |
Java, pre-processed | https://doi.org/10.5281/zenodo.3628665 |
C, pre-processed | https://doi.org/10.5281/zenodo.3628638 |
Python, pre-processed | https://doi.org/10.5281/zenodo.3628636 |
Trained models | https://doi.org/10.5281/zenodo.3628628 |
Codeprep library (for vocabulary study): https://github.com/giganticode/codeprep
Open-vocabulary Neural LM: https://github.com/mast-group/OpenVocabCodeNLM
If you jse the artifacts, please cite the paper:
@article{karampatsis2020big,
title={Big Code!= Big Vocabulary: Open-Vocabulary Models for Source Code},
author={Karampatsis, Rafael-Michael and Babii, Hlib and Robbes, Romain and Sutton, Charles and Janes, Andrea},
journal={arXiv preprint arXiv:2003.07914},
year={2020}
}