Skip to content

Storing Data in the Repository for a "Frontend Only" Web App

Ryan Chase edited this page Jun 20, 2024 · 1 revision

The purpose of this page is to compile a list of resources that provide options and alternatives to data storage for the 311 Data project.

Storing Data Without Git Large File System (LFS)

TLDR:

  • without LFS, git stores the file and its history across all commits
  • even after deleting the file from the repo, committed changes to these files can balloon the repo and make it tedious for developers

Full Top answer a Stack Overflow What is the advantage of git LFS (Stack Overflow)...

One specificity of Git (and other distributed systems) compared to centralized systems is that each repository contains the whole history of the project. Suppose you create a 100 MB file, modify it 100 times in a way that doesn't compress well. You'll end up with a 10 GB repository. This means that each clone will download 10 GB of data, eat 10 GB of disk space on each machine on which you're making a clone. What's even more frustrating: you'd still have to download these 10 GB of data even if you git rm the big files.

Putting big files in a separate system like git-lfs allow you to store only pointers to each version of the file in the repository, hence each clone will only download a tiny piece of data for each revision. The checkout will download only the version you are using, i.e. 100 MB in the example above. As a result, you would be using disk space on the server, but saving a lot of bandwidth and disk space on the client.

In addition to this, the algorithm used by git gc (internally, git repack) does not always work well with big files. Recent versions of Git made progress in this area and it should work reasonably well, but using a big repository with big files in it may eventually get you in trouble (like not having enough RAM to repack your repository).

Using Git LFS in the repo

Clone this wiki locally