This repository provides code to generate ARC-like datasets, build a custom T5-based model with 2D positional embeddings, and train end-to-end on those tasks. It is structured for quick experimentation with different ARC tasks, custom tokenizers, and a flexible pipeline using Hugging Face Transformers and PyTorch Lightning.
License: MIT.
- Python 3.10 (recommended: 3.10.12)
- Additional packages as listed in:
requirements.txt
(strict pinned versions)setup_full.py
(if you need a more strongly pinned environment)
-
Clone the Repository & Create a Virtual Environment
git clone https://github.com/khalil-research/ViTARC.git cd ViTARC python3.10 -m venv venv source venv/bin/activate
-
Install the Package
pip install --upgrade pip pip install -e .
-
Run the Training Script
python vitarc/training/train.py
Command-line arguments:
--task_idx (int, default=0) --max_input_length (int, default=1124) --max_target_length (int, default=1124) --batch_size (int, default=8) --epochs (int, default=1) --seed (int, default=1230) --ds_base_dir (str, default="./arc_x2y_datasets") --use_slurm_copy (bool, default=False) # HPC copy to SLURM_TMPDIR
.
├── arc_x2y_datasets
├── requirements.txt
├── setup.py
├── setup_full.py
└── vitarc
├── datasets
│ ├── gen_dataset.py
│ └── obj_idx_utils.py
├── external
│ └── re_arc
│ ├── LICENSE_re_arc
│ ├── README.md
│ ├── dsl.py
│ ├── generators.py
│ ├── main.py
│ ├── utils.py
│ └── verifiers.py
├── models
│ └── model.py
├── tests
│ ├── test_gen_dataset.py
│ └── test_tokenizer.py
├── tokenizers
│ └── arc_tokenizer_v1
└── training
└── train.py
We generate ARC-like datasets by leveraging code from re_arc
(MIT licensed, https://github.com/michaelhodel/re-arc), included under vitarc/external/re_arc
. See LICENSE_re_arc
in that folder for more details.
A simple usage example:
from vitarc.datasets.gen_dataset import generate_single_dataset_hf
task_key, final_ds, stats = generate_single_dataset_hf(
task_idx=0,
seed=1230,
n_examples=1000,
testsize=10
)
This returns a Hugging Face DatasetDict
with train
, validation
, and test
splits. See tests/test_gen_dataset.py
for more detailed usage.
We use a Hugging Face–style tokenizer for ARC tasks:
from vitarc.tokenizers.arc_tokenizer import get_or_build_arc_tokenizer
tokenizer = get_or_build_arc_tokenizer("arc_tokenizer_v1")
This returns a tokenizer configured for ARC-like inputs/outputs. See tests/test_tokenizer.py
for an example.
ViTARCForConditionalGeneration
is a specialized T5-based model for the ViTARC project, extending T5ForConditionalGeneration
. It adds various positional-embedding and relative-attention features beyond vanilla T5:
- 2D absolute positional embeddings (
ape_type="SinusoidalAPE2D"
, etc.) - Relative attention with multi-slope Alibi (
rpe_type="Four-diag-slope-Alibi"
or"Two-slope-Alibi"
) - Object-based embeddings (controlled by
use_OPE=True
or False) - Custom embedding mixer strategies (
ape_mixer="weighted_sum_no_norm_vec"
,"learnable_scaling"
, etc.)
When instantiating ViTARCForConditionalGeneration
via a T5Config
, the model looks for the following fields (if present):
ape_type
(str
):- Examples:
"SinusoidalAPE"
,"SinusoidalAPE2D"
,"LearnedAPE"
, or"none"
. - Defaults to
"SinusoidalAPE2D"
.
- Examples:
rpe_type
(str
):- Examples:
"Four-diag-slope-Alibi"
,"Two-slope-Alibi"
. - Defaults to
"Two-slope-Alibi"
.
- Examples:
rpe_abs
(bool
):- Whether to combine absolute & relative positional embeddings (or not).
- Defaults to
True
if not set.
use_OPE
(bool
):- Enables object-based embeddings. Defaults to
True
.
- Enables object-based embeddings. Defaults to
ape_mixer
(str
):- Supported strategies:
- 'hardcoded_normalization'
- 'learnable_scaling'
- 'weighted_sum'
- 'weighted_sum_no_norm'
- 'learnable_scaling_vec'
- 'weighted_sum_vec'
- 'weighted_sum_no_norm_vec'
- 'positional_attention'
- 'layer_norm'
- 'default'
- Supported strategies:
Below is an example usage that sets some of these fields:
from transformers import T5Config
from vitarc.models.model import ViTARCForConditionalGeneration
from vitarc.tokenizers.arc_tokenizer import get_or_build_arc_tokenizer
tokenizer = get_or_build_arc_tokenizer("arc_tokenizer_v1")
config = T5Config(
vocab_size=len(tokenizer),
d_model=128,
num_layers=3,
num_decoder_layers=3,
num_heads=8,
d_ff=256,
dropout_rate=0.1,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
bos_token_id=tokenizer.bos_token_id,
decoder_start_token_id=tokenizer.pad_token_id,
rows=33, # Custom field used by ViTARC for 2D embeddings
cols=34, # Custom field used by ViTARC for 2D embeddings
# ViTARC-specific fields:
ape_type="SinusoidalAPE2D",
rpe_type="Two-slope-Alibi",
rpe_abs=True,
use_OPE=True,
ape_mixer="weighted_sum_no_norm_vec", # or "learnable_scaling", "weighted_sum", ...
)
model = ViTARCForConditionalGeneration(config)
See vitarc/training/train.py
for a full training loop based on PyTorch Lightning.
License: MIT.
If you use or reference this code, please cite:
Tackling the Abstraction and Reasoning Corpus with Vision Transformers: the Importance of 2D Representation, Positions, and Objects
arXiv:2410.06405
This code relies on:
- Hugging Face Transformers
- PyTorch & PyTorch Lightning
- re_arc (MIT license) for generating ARC tasks
- Datasets (Hugging Face)