Shared Memory Tensor Dataset with torchrun

Overview

This repository provides an example of reading from a single shared memory tensor from multiple processes (e.g., with DDP).
Useful for loading a large tensor (e.g., the entire dataset) to the CPU to speed up I/O without incurring Nx memory usage where N is the number of GPUs/processes
We use the standard torch.utils.data.Dataloader which might make it easier for you to use this in your own code
Works with torchrun
Does not depend on detectron2

We did not test this script in the multi-node setting. It probably would not work.

(N is the number of GPUs/processes)

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
common.py		common.py
dataset.py		dataset.py
main-multigpu-naive.py		main-multigpu-naive.py
main-multigpu-shared.py		main-multigpu-shared.py