Optimized Transfer Learning on GPT for Automatic Short Answer Grading

This repository is based on the original NanoGPT repository: https://github.com/karpathy/nanoGPT, from which it is forked.

Logging is done via Weights and Biases. See one of the sample projects we have used to log our experiment runs here: https://wandb.ai/wg2404/NanoGPT-TransferLearning?workspace=user-wg2404.

Project Description

In this work we extend the open source NanoGPT repository to perform transfer learning on a pretrained GPT model to complete the Automatic Short Answer Grading task. By reducing the pretrained model to a simpler architecture with fewer parameters, optimizing the data loading portion of the training loop, timing and profiling the code, and switching to more powerful hardware, we are able to reduce runtime on the ASAG task without harming signifanctly harming performance. Our final results fall short of state of the art, however our framework can now be used to easily experiment with new architectures and settings to try to improve ASAG performance.

Code Outline

Transfer Learning Functionality

The existing train.py file is reused for the transfer learning implementation. A new option in 'init_from', 'transfer', will initialize transfer learning mode. When this mode is active, a new data preprocessing loop is performed, and a new get_batch method is ultimately performed. This will also lead to a new model type being used, the TransferGPT class, which is defined in the model.py file, below the GPT class, which was already in the original implementation. This new class defines the network architecture and the forward pass method, as well as some helper methods, for the TransferGPT model type.

To perform hyperparameter tuning, a hyperparam_sweep.py file is included, this file is completely new. It needs a lot of the same data preprocessing functionality, so this functionality is split out (and for now, unfortunately, duplicated) in the mohler_dataset_preprocessing.py file.

To evaluate transfer learning the eval_transfer.py script was created, which performs the evaluation on the test set.

A jupyter notebook called mohler_transfer_learning_experiments.ipynb contains some experiments which were run to understand the dataset and figure out how to transform it before passing it into the network.

Sample Commands

To perform the transfer learning related operations, please ensure you are in an environment with everything in the Requirements.txt file installed. Also note that in order to actually perform one of these transfer learning operations, an appropriate checkpoint file from a pretraining job must exist in the pretrained/ directory. If you would like to recreate experiments detailed below and in our paper, please reach out to one of the authors of this repo, who can provide you with a checkpoint, or explain how to create a new one.

With everything in place, different functionalities can be exercised as explained below:

Preparing OpenWebText dataset

python data/openwebtext/prepare.py

Pretraining:

python train.py config/train_12_768_1024.py

Transfer Learning Training:

python train.py config/transfer_train.py

Evaluation:

python eval_transfer.py

Hyperparameter Sweep:

python hyperparam_sweep.py

Name		Name	Last commit message	Last commit date
Latest commit History 212 Commits
assets		assets
config		config
data		data
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bench.py		bench.py
configurator.py		configurator.py
eval_transfer.py		eval_transfer.py
hyperparam_sweep.py		hyperparam_sweep.py
model.py		model.py
mohler_dataset_preprocessing.py		mohler_dataset_preprocessing.py
mohler_transfer_learning_experiments.ipynb		mohler_transfer_learning_experiments.ipynb
sample.py		sample.py
scaling_laws.ipynb		scaling_laws.ipynb
train.py		train.py
transformer_sizing.ipynb		transformer_sizing.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Optimized Transfer Learning on GPT for Automatic Short Answer Grading

Project Description

Code Outline

Transfer Learning Functionality

Sample Commands

Results

About

Releases

Packages

Languages

License

will-gerard/nanoGPT

Folders and files

Latest commit

History

Repository files navigation

Optimized Transfer Learning on GPT for Automatic Short Answer Grading

Project Description

Code Outline

Transfer Learning Functionality

Sample Commands

Results

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages