Theoretical/Empirical Study of In-Context Learning

Final project for COMPSCI 182/282A - Designing, Visualizing and Understanding Deep Neural Networks (Fa23)

Abstract

This study investigates the in-context learning capabilities of Generative Pre-trained Transformers (GPT) models in performing logistic regression, including its kernelized variant using the Radial Basis Function (RBF) kernel. We explore whether GPT models can effectively learn logistic regression in-context, comparing their performance with traditional machine learning algorithms like k-NN, SVM, and Gaussian Process Classifier. Our methodology involves generating synthetic data for logistic regression tasks and training GPT models on these datasets, both with and without noise. We also examine the impact of scaling on model accuracy. Our findings indicate that GPT models show promising results in learning logistic regression in-context, outperforming or matching most baselines in various scenarios. This research contributes to understanding the potential of GPT models in statistical tasks and opens avenues for further exploration into their in-context learning mechanisms.

Setup

The conda environment for this project can be installed using the following command (Linux):

conda env create -f environment.yml

install plotly and kaleido:

pip install plotly
pip install -U kaleido
pip install nbformat

Run the Code

Setup:

cd src
conda activate icl
export CUDA_VISIBLE_DEVICES=Your_GPU_ID

Basic Experiments

Vanilla Logistic Regression:

python train.py --config conf/logistic_regression.yaml

RBF Logistic Regression with Clean Training and Testing Data:

python train.py --config conf/rbf_logistic_regression.yaml

Add Noise

Vanilla and RBF Logistic Regression with Noisy Training and Testing Data:

python train.py --config conf/lr_noise0.2.yaml
python train.py --config conf/rbf_lr_noise0.2.yaml

We also experimented noise probabilities of 0.05 and 0.1, which could be run by replacing the 0.2 with 0.05 and 0.1

Varying problem dimensions

python train.py --config conf/rbf_lr_noise0.1_dim[10/30/40].yaml
python train.py --config conf/rbf_lr_noise0.1_dim[10/30/40].yaml

Varying model capacity

python train.py --config conf/rbf_lr_[small/tiny]_noise0.1.yaml

Scaling

Modify the task and run_id in /ICL/src/analysis/query_scale.py, and run

python /ICL/src/analysis/query_scale.py

Data Distribution

Train: The working directory should be Your_path/ICL/src

# logistic regression
python train.py --config conf/ood/standard.yaml
python train.py --config conf/ood/opposite.yaml
python train.py --config conf/ood/random.yaml
python train.py --config conf/ood/orthogonal.yaml
python train.py --config conf/ood/proj.yaml

# rbf logistic regression
python train.py --config conf/ood/standard_rbf.yaml
python train.py --config conf/ood/opposite_rbf.yaml
python train.py --config conf/ood/random_rbf.yaml
python train.py --config conf/ood/orthogonal_rbf.yaml
python train.py --config conf/ood/proj_rbf.yaml

Evaluate and plot: The working directory should be Your_path/ICL/

export PYTHONPATH={YOUR_WORKING_DIR}/ICL/src
python {YOUR_WORKING_DIR}/ICL/src/analysis/ood.py

Random label

Train: The working directory should be Your_path/ICL/src/

# logistic regression
python train.py --config conf/randlb/None.yaml
python train.py --config conf/randlb/normal.yaml
python train.py --config conf/randlb/permute.yaml
python train.py --config conf/randlb/uniform.yaml

# rbf logistic regression
python train.py --config conf/randlb/None_rbf.yaml
python train.py --config conf/randlb/normal_rbf.yaml
python train.py --config conf/randlb/permute_rbf.yaml
python train.py --config conf/randlb/uniform_rbf.yaml

Evaluate and plot: The working directory should be Your_path/ICL/

export PYTHONPATH={YOUR_WORKING_DIR}/ICL/src
python {YOUR_WORKING_DIR}/ICL/src/analysis/randlb.py

Testing

ood data generation and visualization:

export PYTHONPATH=/csproject/t3_lzengaf/lzengaf/ICL/src
python src/ood_data_gen.py

Code Reading Note

Curriculum:

n_dims start=5, ends=20

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?

if the label really matter. Change the label to random number. From the training code, we change the target but keep the correct ys to see if the claim is true. the point_wise_loss is not used for gradient update, but for logging. Hence, only the loss is modified.

Acknowledgements

The code for this project is based on the following repositories:

https://github.com/dtsip/in-context-learning

Name		Name	Last commit message	Last commit date
Latest commit History 155 Commits
.vscode		.vscode
figs		figs
initialization		initialization
plots		plots
scripts		scripts
src		src
.gitignore		.gitignore
CS_182_Project_Report_final.pdf		CS_182_Project_Report_final.pdf
ERROR_CATCH.md		ERROR_CATCH.md
LICENSE		LICENSE
README.md		README.md
env.yml		env.yml
setting.jpg		setting.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Theoretical/Empirical Study of In-Context Learning

Final project for COMPSCI 182/282A - Designing, Visualizing and Understanding Deep Neural Networks (Fa23)

Abstract

Setup

Run the Code

Basic Experiments

Add Noise

Varying problem dimensions

Varying model capacity

Scaling

Data Distribution

Random label

Testing

Code Reading Note

Acknowledgements

About

Releases 1

Packages

Contributors 3

Languages

License

zenglingqi647/ICL

Folders and files

Latest commit

History

Repository files navigation

Theoretical/Empirical Study of In-Context Learning

Final project for COMPSCI 182/282A - Designing, Visualizing and Understanding Deep Neural Networks (Fa23)

Abstract

Setup

Run the Code

Basic Experiments

Add Noise

Varying problem dimensions

Varying model capacity

Scaling

Data Distribution

Random label

Testing

Code Reading Note

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Languages

Packages