A web-based Agile story point estimator
View Demo »
Table of Contents
All of the datasets for 16 different projects are available in the marked_data folder Each dataset has the following 5 columns:
- issuekey: Issue ID
- title: Issue Title
- description: Issue Description
- storypoint: Assigned Story Point of the Issue
- split_mark: Represent whether the row was used as training, validation, or testing
issuekey | title | description | storypoint | split_mark |
---|---|---|---|---|
... | ... | ... | ... | ... |
All of the models on HuggingFace Model Hub and Google Drive has the same naming convention as described in the following table:
Model ID | Model Specification | Experiment Scenario |
---|---|---|
#0 | BPE GPT2 Tokenizer + Custom Pre-trained GPT-2 (GPT2SP) | Within-Project |
#00 | BPE GPT2 Tokenizer + Custom Pre-trained GPT-2 (GPT2SP) | Within-Repository |
#000 | BPE GPT2 Tokenizer + Custom Pre-trained GPT-2 (GPT2SP) | Cross-Repository |
#2 | Word-levelSP Tokenizer + Custom Pretrained GPT-2 | Within-Project |
#22 | Word-levelSP Tokenizer + Custom Pretrained GPT-2 | Within-Repository |
#222 | Word-levelSP Tokenizer + Custom Pretrained GPT-2 | Cross-Repository |
#6 | WordPieceSP Tokenizer + Custom Pretrained GPT-2 | Within-Project |
#66 | WordPieceSP Tokenizer + Custom Pretrained GPT-2 | Within-Repository |
#666 | WordPieceSP Tokenizer + Custom Pretrained GPT-2 | Cross-Repository |
#7 | SentencePieceSP Tokenizer + Custom Pretrained GPT-2 | Within-Project |
#77 | SentencePieceSP Tokenizer + Custom Pretrained GPT-2 | Within-Repository |
#777 | SentencePieceSP Tokenizer + Custom Pretrained GPT-2 | Cross-Repository |
- Three different pre-trained tokenizers can be found in the all_tokenizers folder: Word-levelSP Tokenizer, WordPieceSP Tokenizer , and SentencePieceSP Tokenizer
- All of the models included in our experiments can be found on the Model Hub provided by HuggingFace
- For your information, the models can also be downloaded from this Google Drive
- All of the training scripts for different pre-trained tokenizers included in the experiments (RQ3) can be found in tokenizer_training_notebook.ipynb
- The model training scripts can be found in model_training_notebook.ipynb that contains all of the model training process for our experiments (RQ1 + RQ2 + RQ3)
- All of the raw predictions from different experiments can be found in the testing_results folder where we classify the folders based on the experimental scenarios (i.e., within-project and cross-project)and each experimental scenario has four different models as follows:
- GPT2SP
- SentencePieceSP+GPT2
- Word-levelSP+GPT2
- WordPieceSP+GPT2
- Access the GPT2SP web app here to interact with our GPT2SP model and navigate the datasets
- Special thanks to DeepSE's developers for providing the datasets and the replication package.
- Special thanks to developers from PyTorch, HuggingFace, Streamlit, Transformers Interpret for providing amazing frameworks for the community