ORiGAMi - Object Representation through Generative Autoregressive Modelling

Disclaimer

Please note: This tool is not officially supported or endorsed by MongoDB, Inc. The code is released for use "AS IS" without any warranties of any kind, including, but not limited to its installation, use, or performance. Do not run this tool against critical production systems.

Overview

ORiGAMi is a transformer-based Machine Learning model for supervised classification from semi-structured data such as MongoDB documents or JSON files.

Typically, when working with semi-structured data in a Machine Learning context, the data needs to be flattened into a tabular format first. This flattening can be lossy, especially in the presence of arrays and nested objects, and often requires domain expertise to extract meaningful higher-order features from the raw data. This feature extraction step is manual, slow and expensive and doesn't scale well.

ORiGAMi circumvents this by directly operating on JSON data. Once a model is trained, it can be used to make predictions on any field in the dataset.

Installation

ORiGAMi requires Python version 3.10 or 3.11. We recommend using a virtual environment, such as Python's native venv.

To install ORiGAMi with pip, use

pip install origami-ml

You can also clone the repository to your local machine and install the dependencies manually:

git clone https://github.com/mongodb-labs/origami.git
cd origami
pip install -r requirements.txt
pip install -e .

Usage

ORiGAMi comes with a command line interface (CLI) and a Python SDK.

Usage from the Command Line

The CLI allows to train a model and make predictions from a trained model. After installation, run origami from your shell to see an overview of available commands.

Help for specific commands is available with origami <command> --help, where <command> is currently one of train or predict. Note that the first time you run the origami CLI tool can take longer.

Detailed documentation for the CLI and available options can be found in CLI.md.

Usage with Python

To see an example on how to use ORiGAMi from Python, take a look at the provided ./notebooks folder, e.g. the example_origami_dungeons.ipynb notebook.

Experiment Reproduction

This code is released alongside our paper, which can be found on Arxiv: ORIGAMI: A generative transformer architecture for predictions from semi-structured data. To reproduce the experiments in the paper, see the instructions in the ./experiments/ directory.

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
.github/workflows		.github/workflows
assets		assets
experiments		experiments
notebooks		notebooks
origami		origami
tests		tests
.gitignore		.gitignore
CLI.md		CLI.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ORiGAMi - Object Representation through Generative Autoregressive Modelling

Disclaimer

Overview

Installation

Usage

Usage from the Command Line

Usage with Python

Experiment Reproduction

About

Releases

Packages

Languages

License

mongodb-labs/origami

Folders and files

Latest commit

History

Repository files navigation

ORiGAMi - Object Representation through Generative Autoregressive Modelling

Disclaimer

Overview

Installation

Usage

Usage from the Command Line

Usage with Python

Experiment Reproduction

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages