Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: package version lock files #151

Open
1 task done
precompute opened this issue Jun 29, 2023 · 23 comments
Open
1 task done

[Feature]: package version lock files #151

precompute opened this issue Jun 29, 2023 · 23 comments
Labels
enhancement New feature or request

Comments

@precompute
Copy link

Feature Description

Pinning packages makes configs reproducible. Currently, the only way to pin packages is to get the current hash for every package and add it to the appropriate use-package block.

I suggest implementing a new alist, elpaca-package-hash-alist that holds the hash for every package. Elpaca would be able to generate this alist automatically, so users could effortlessly set elpaca-package-hash-alist to this value upon startup. Elpaca would check these values and act accordingly (pull / reset / do nothing, etc). Every package with :elpaca t would be affected, and there could be a user-option for enabling this behavior.

Also, maybe this alist could be written to a .elpaca-pins file or similar?

Example:

Set PACKAGE-NAME to HASH

(add-to-list 'elpaca-package-hash-alist (cons PACKAGE-NAME HASH))

HASH could be set to t to signal an upgrade.

Confirmation

  • The feature I'm proposing does not already exist in Elpaca
@precompute precompute added the enhancement New feature or request label Jun 29, 2023
@progfolio
Copy link
Owner

progfolio commented Jun 29, 2023

Hi. Thanks for the suggestion. What you're suggesting is referred to as a version lock file.

There is currently the elpaca-write-lockfile function which will create an item menu with the current recipes for each package. For example, (elpaca-write-lockfile "/tmp/test.eld") produces the following for data:

((elpaca :source
   "lockfile" :date (25757 57844 206472 188000) :recipe
   (:protocol https :inherit t :depth 1 :repo
              "https://github.com/progfolio/elpaca.git" :ref
              "272966b864db86604535bced55b3dfa3c7ed8532" :pre-build
              ("git" "remote" "set-url" "origin"
               "[email protected]:progfolio/elpaca.git")
              :files (:defaults (:exclude "extensions")) :build
              (:not elpaca--activate-package) :package "elpaca"))
              ;; Other packages omitted
              )

The full recipe is stored with a computed :ref recipe keyword. What's missing is a way to rebuild packages "from scratch" so the package can be reset to that state. It's trickier than storing a commit ref, though. We'd want to disable inheritance for that recipe, possibly override the :depth keyword, etc.

Pinning packages makes configs reproducible.

Lock files work so long as the upstream source still has the commit referenced in the lock file available.
However, if the upstream disappears or overwrites history, the ref is useless.

What I've been experimenting with is keeping the entire Elpaca package store in a repository.
This has the benefit that the entire source code of each library is available despite what happens upstream.
It can also simplify the machinery around restoring package state (by being a thin layer over git).
The trade off is that a package store repository is obviously larger on disk than a lock file, but I don't think the difference is significant if you consider that the lock file doesn't do anything on its own (it would have to be used to download all the repos, anyhow).

I've experimented with a few backup strategies and I haven't decided which I'll end up with for Elpaca.
I may design it in a way where one could sub out for their own strategy as well.

Related issues: #24 #36

@progfolio progfolio changed the title [Feature]: Make pinning packages easier [Feature]: package version lock files Jun 29, 2023
@axgfn
Copy link

axgfn commented Jul 15, 2023

I really like the idea of keeping the entire Elpaca package store in git. Curiously, I think it could even make Elpaca usable in environments without access to git (like the new Android port of Emacs, for example). You would just download main.tar.gz from your package store repository on GitHub or other git forge.

I'm interested in Elpaca and I think it has a lot going for it, but I'm not willing to switch to another package manager until it can match straight in reproducibility. Keeping an eye on this issue.

@hammerandtongs
Copy link

I'm not a fan of the entire Elpaca store in git.

I absolutely do need a lock file as it would give me a known good config.

I have 4 workstations that I use emacs on with the same git directory holding the config only.

I know some people check their elpa directory in or rsync but I don't like the idea of that for many reasons.

Despite using git and git-annex extensively I've never wanted this solution and resist having another pile to move around.

I'd like a normal Cargo.lock (very successful for rust) style text file that I could check into git, easily inspect and edit(say if someone rarely delete a remote git) in emacs or vim without doobedydeeing around in git to fix or alter things.

I don't think archiving other peoples git trees is a good problem for a package manager to solve.

Any binary files will start to explode the size of the elpaca git blob.

Without storing binary artifacts the benefits some people imagine won't actually be there.

@progfolio
Copy link
Owner

Any binary files will start to explode the size of the elpaca git blob.

Not many packages include binary blobs. Are there specific packages which come to mind?
It would be good to know so I can build a pessimistic test case.

Without storing binary artifacts the benefits some people imagine won't actually
be there.

My hunch that this scenario is even rarer than git repos disappearing or history being rewritten. A lockfile does not guarantee the presence of any system binaries either, so it's a shared flaw between both approaches.

There are trade offs between both approaches and I plan on making things flexible enough to accommodate either.

@milanglacier
Copy link

milanglacier commented Jul 29, 2023

I'm not a fan of the entire Elpaca store in git.

I absolutely do need a lock file as it would give me a known good config.

I have 4 workstations that I use emacs on with the same git directory holding the config only.

I know some people check their elpa directory in or rsync but I don't like the idea of that for many reasons.

Despite using git and git-annex extensively I've never wanted this solution and resist having another pile to move around.

I'd like a normal Cargo.lock (very successful for rust) style text file that I could check into git, easily inspect and edit(say if someone rarely delete a remote git) in emacs or vim without doobedydeeing around in git to fix or alter things.

I don't think archiving other peoples git trees is a good problem for a package manager to solve.

Any binary files will start to explode the size of the elpaca git blob.

Without storing binary artifacts the benefits some people imagine won't actually be there.

Yes, I do agree. The cargo.toml style package version control system is good enough in 99% scenario.

In case the upstream package has changed, since it is something that less usually happens, the user can just manually switch the package upstream or just reset the upstream to a fork with its local copy.

In my own perspective, the approach of maintaining an entire Elpaca in git is not a good practice for source management. I seldom see any projects will include the src of the third packages into their own source code. Besides, the git size will grow rapidly.

I currently use straight, and the directory ~/.emacs/straight/repos has a size of 391M, and it is just a snapshot. Thinking about if you want to manage this folder into source management system, how large your .git will become?

Using git submodule will point to same pity: once the upstream changes, you can also not initialize all the submodule in a fresh install.

@axgfn
Copy link

axgfn commented Jul 30, 2023

My ~/.config/emacs/straight/repos/ directory is 713M, but ~/.config/emacs/straight/build/ is only 16M when I exclude .elc files. That's more like what I imagined would be tracked in git. I'm also not worried about it ballooning too much in size over time. Git is pretty good at compression, and we'd only need a new snapshot for each time packages are updated, which I expect for most users is only a weekly or monthly chore.

@roshanshariff
Copy link

@ajgrf, if I'm not mistaken, the straight/build directory usually just has the compiled .elc files, other binaries and build output, and symlinks to the original source .el files in straight/repos. The symlinks are negligible, and you're excluding the .elc files; that leaves only other miscellaneous binaries in your measurement. Needless to say, it's not enough to just track those if you want working packages.

@xendk
Copy link

xendk commented Sep 8, 2023

I'll throw my vote for a simple lockfile.

While the idea of having all your packages safely stored in case Github blows up sounds tempting, I see it as a solution to a problem I don't have. But in the most realistic case of a ref or even a complete repo disappearing, the first thing I'd be looking into is fixing the situation, finding a new package or otherwise deal with the problem. I use elpaca to install packages, that is (more or less) maintained packages, I don't need it to deal with dead code that once was a package. Worst case scenario I'll dig it out of elpaca/repos and make my own "package". What I want is to being able to say "this doesn't work. I know it worked last week. Please start up the Delorean." And if it could integrate well with git bisecting my init.el, that would be swell.

As for the size thing.

~/.c/emacs ▶ du -hcs straight/*
0       straight/bootstrap.el
47M     straight/build
516K    straight/build-cache.el
4,0K    straight/modified
2,9G    straight/repos
16K     straight/versions
2,9G    total
~/.c/emacs ▶ du -hcs elpaca/*
32M     elpaca/builds
28M     elpaca/cache
506M    elpaca/repos
565M    total

They're not entirely equal, there's been a bit of package churn since I switched to elpaca, but they're the same ballpark. Obviously elpaca saves quite a bit by doing shallow copies, but it's still 500M to save the source for the build.

I'll admit to being the type that's not afraid to mess around with the source in repos when trying to fix bugs, and then forgetting about getting the changes anywhere. How would storing the packages in git deal with local modifications?

@progfolio
Copy link
Owner

How would storing the packages in git deal with local modifications?

The state of the repos and builds directories are stored as is.
So if you have local modifications, they would be stored.

@xendk
Copy link

xendk commented Sep 10, 2023

So if you have local modifications, they would be stored.

So how does one tell what is local modification? I assume the history of the individual repo directories isn't part of this.

It this basically the same as adding elpaca/repos and elpaca/builds to ones .config/emacs repository (with some magic to avoid submodules for repos)?

@progfolio
Copy link
Owner

progfolio commented Sep 10, 2023 via email

@xendk
Copy link

xendk commented Sep 10, 2023

The git history of each repository would be preserved as well.

In my case, that's 3 gigs of data, half a gig if going with shallow checkouts. As with some of the other posters, I'm a bit skeptical...

The entire store as is is committed to the package store repository in a way that avoids submodules.

Oh, care to share your secret sauce? I'm just curious.

There would also be a minimal API around it so you don't really need to know how to use git to use it.

Ah, I think we've got the source of the dissonance in this issue here. You're working on an user-friendly, self-contained solution that can be used by anyone. But those asking for a lock file already has their config in git and are looking for a way to control elpaca from that.

It's two different user-stories, but as you say, they ought to be able to co-exist. It's "just" a matter of someone implementing elpaca-load-lockfile.

@progfolio
Copy link
Owner

progfolio commented Sep 10, 2023 via email

@xendk
Copy link

xendk commented Sep 10, 2023

I thought you showed 565M total in your store earlier?

Well, that's shallow repos most of it. Looking about, it seems that elpaca will do shallow clones and then fetch new history when updating? I'll admit I've never worked much with shallow clones.

For example, you'd need a way to say "rebuild these packages from
scratch".

Why does it need to re-clone? Nuking the build dir seems like a sensible cleanup, but why re-clone if the ref we're updating/downgrading to is in the repo? If you're trying to revert to a working configuration, the needed ref should already be available (unless shallow copies get in the way, of course). I would think that bringing repos to the same version as the lockfile and rebuilding packages that were changed should suffice (well, plus cloning stuff that hasn't yet, to support the "rebuild from scratch" scenario).

without losing any possible changes to the repo. It sounds easy until you start implementing it.

Well yeah... Would it help if the prerequisite for elpaca-write-lockfile was no uncommitted changes? It does open up new ways to shoot oneself in the foot (if one nukes the local repo and had a local branch with changes for instance), but it could work.

@progfolio
Copy link
Owner

progfolio commented Sep 11, 2023 via email

@psionic-k
Copy link
Contributor

Lock files work so long as the upstream source still has the commit referenced in the lock file available.
However, if the upstream disappears or overwrites history, the ref is useless.

How about creating a nix profile depending on the git sources for that set? As long as you hold onto the resulting profile as a GC root, all the git sources will remain in the store. Since the Guix store is basically the same implementation, both of these systems can be used similarly for holding onto snapshots of all the packages efficiently.

https://nixos.org/manual/nix/stable/package-management/profiles

To rehydrate, you would just copy the immutable git sources into /repos and rebuild everything and maybe do updates.

@roshanshariff
Copy link

@psionic-k You could probably achieve the same thing without nix by creating git branches in the same repository, one for each upstream repo, pointing at the commit you're using. I suspect this is the approach @progfolio is considering as the "full backup" method? You could check out the individual branches as worktrees to share the git repository and objects between them.

The downside is that git doesn't expect to be used in this way, so it'll be a bit harder to interact with the upstream repos and push patches, etc. But I guess it would work for backups, since you could use the recipe metadata to reconstruct things like upstream URLs and branch names that would normally be in the config of a checked out git repo.

@dominicm00
Copy link

I will give my 2¢ that this problem is the entire purpose of https://archive.softwareheritage.org/. There's nothing wrong with giving people the option to create full git backups as described, but frankly I think falling back to software heritage on clone failure is simple and robust. Guix, a project which takes source reproducibility very seriously, takes this approach.

@progfolio
Copy link
Owner

Thanks for chiming in.

I will give my 2¢ that this problem is the entire purpose of https://archive.softwareheritage.org/.

Cool project. However, after kicking the tires, it looks like it's missing quite a few of my github repositories.

There's nothing wrong with giving people the option to create full git backups as described, but frankly I think falling back to software heritage on clone failure is simple and robust. Guix, a project which takes source reproducibility very seriously, takes this approach.

I have an idea for how to implement simple lockfiles which will be at least on par with what straight.el offers (with a better UI). The main hurdle now is time. Money is tight for me right now (unfortunately, I don't pay my bills by writing software) so I've had to pick up two jobs and am working long hours most days. When I get some time I will implement the idea I have.

@dominicm00
Copy link

I will give my 2¢ that this problem is the entire purpose of https://archive.softwareheritage.org/.

Cool project. However, after kicking the tires, it looks like it's missing quite a few of my github repositories.

I'm surprised! Usually anything on GitHub is on there. Maybe I'll look into creating a (M)ELPA lister so that published emacs packages are indexed more regularly. It's also possible I can make a submission tool within emacs...will take a look.

I have an idea for how to implement simple lockfiles which will be at least on par with what straight.el offers (with a better UI). The main hurdle now is time. Money is tight for me right now (unfortunately, I don't pay my bills by writing software) so I've had to pick up two jobs and am working long hours most days. When I get some time I will implement the idea I have.

Of course; you've already created more than enough incredible software for free! Thank you so much for what you've done already! IMO elpaca is basically as close to perfect as we have in a package manager ❤️

@Martinsos
Copy link

Martinsos commented Jan 19, 2025

What would you suggest as the best solution in the meantime, while waiting for the full support to be implemented? On one hand I like the idea of having something like a lock file, but I would also like to be resistant to network issues like the ones that Gnu Savannah seems to often have, where lock file won't help since I can't download the packages. Would the solution then be to just version control elpaca/ directory (assuming I am ok commiting that much MB into my git repo)?

@progfolio
Copy link
Owner

What would you suggest as the best solution in the meantime, while waiting for the full support to be implemented?

I'll get simple support in sooner than later.

On one hand I like the idea of having something like a lock file, but I would also like to be resistant to network issues like the ones that Gnu Savannah seems to often have, where lock file won't help since I can't download the packages.

Bear in mind what's being downloaded from GNU's servers is the package recipes.
Most of the packages are developed elsewhere, so a lockfile would workaround the issue of the recipe source going down (as it has for the past couple days), and can be tracked alongside one's init file in version control.
I've reworked the current ELPA menus so they should be working now.

Would the solution then be to just version control elpaca/ directory (assuming I am ok commiting that much MB into my git repo)?

That's one way. You should be able to discard the builds directory, since that can be rebuilt from the info in the cache directory and the repositories.

@Martinsos
Copy link

Bear in mind what's being downloaded from GNU's servers is the package recipes. Most of the packages are developed elsewhere, so a lockfile would workaround the issue of the recipe source going down (as it has for the past couple days), and can be tracked alongside one's init file in version control. I've reworked the current ELPA menus so they should be working now.

Got it! I just learned about the whole idea of tarballs vs recipes in the last couple of days so this is starting to make sense now.
But in any case, having elapca dir version controled should allow me to rollback independently of non-local factors (like a git repo going down) - that is good to know. Btw for me build dir is quite small in size compared to the rest of the elpaca dir so I will probably commit that one also, although I guess that depends on the specific packages of course.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

10 participants