Skip to content

pdhung3012/gpt2sp

Repository files navigation

GPT2SP Replication Package


GPT2SP

A web-based Agile story point estimator
View Demo »

Table of Contents
  1. How to replicate
  2. Acknowledgements
  3. License

How to replicate

About the Datasets

All of the datasets for 16 different projects are available in the marked_data folder Each dataset has the following 5 columns:

  1. issuekey: Issue ID
  2. title: Issue Title
  3. description: Issue Description
  4. storypoint: Assigned Story Point of the Issue
  5. split_mark: Represent whether the row was used as training, validation, or testing
issuekey title description storypoint split_mark
... ... ... ... ...

About the Models

Model Naming Convention

All of the models on HuggingFace Model Hub and Google Drive has the same naming convention as described in the following table:

Model ID Model Specification Experiment Scenario
#0 BPE GPT2 Tokenizer + Custom Pre-trained GPT-2 (GPT2SP) Within-Project
#00 BPE GPT2 Tokenizer + Custom Pre-trained GPT-2 (GPT2SP) Within-Repository
#000 BPE GPT2 Tokenizer + Custom Pre-trained GPT-2 (GPT2SP) Cross-Repository
#2 Word-levelSP Tokenizer + Custom Pretrained GPT-2 Within-Project
#22 Word-levelSP Tokenizer + Custom Pretrained GPT-2 Within-Repository
#222 Word-levelSP Tokenizer + Custom Pretrained GPT-2 Cross-Repository
#6 WordPieceSP Tokenizer + Custom Pretrained GPT-2 Within-Project
#66 WordPieceSP Tokenizer + Custom Pretrained GPT-2 Within-Repository
#666 WordPieceSP Tokenizer + Custom Pretrained GPT-2 Cross-Repository
#7 SentencePieceSP Tokenizer + Custom Pretrained GPT-2 Within-Project
#77 SentencePieceSP Tokenizer + Custom Pretrained GPT-2 Within-Repository
#777 SentencePieceSP Tokenizer + Custom Pretrained GPT-2 Cross-Repository

How to access the models

About the Model Training Process

About the Evaluation Results

  • All of the raw predictions from different experiments can be found in the testing_results folder where we classify the folders based on the experimental scenarios (i.e., within-project and cross-project)and each experimental scenario has four different models as follows:
    • GPT2SP
    • SentencePieceSP+GPT2
    • Word-levelSP+GPT2
    • WordPieceSP+GPT2

About the GPT2SP Web App

  • Access the GPT2SP web app here to interact with our GPT2SP model and navigate the datasets

Acknowledgements

  • Special thanks to DeepSE's developers for providing the datasets and the replication package.
  • Special thanks to developers from PyTorch, HuggingFace, Streamlit, Transformers Interpret for providing amazing frameworks for the community

License

MIT License

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published