Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trivial-builders: init addTmateBreakpoint #334924

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

MatthewCroughan
Copy link
Contributor

@MatthewCroughan MatthewCroughan commented Aug 15, 2024

Description of changes

Testable with nix-build -A tests.trivial-builders.manualTesting.addTmateBreakpoint

This adds a function to Nixpkgs that allows you to wrap a derivation and insert a hook that creates an ssh session with tmate, allowing you to enter the derivation at the exact point it went wrong, regardless of whether the derivation is being built on a remote-builder.

{ addTmateBreakpoint, hello }:
(addTmateBreakpoint hello).overrideAttrs { postPatch = "exit 1"; }

Will result in the following

❯ nix-build -A tests.trivial-builders.addTmateBreakpoint
evaluation warning: addTmateBreakpoint in use, ignore the sha256 warning and do not leak these build logs, otherwise unauthorized ssh access to the build sandbox may occur
evaluation warning: calling makeSetupHook without passing a name is deprecated.
warning: found empty hash, assuming 'sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA='
this derivation will be built:
  /nix/store/ryf52zzfkk1h7m9w5w1r8m2s7gb45vlw-hello-2.12.1.drv
resolved derivation: '/nix/store/ryf52zzfkk1h7m9w5w1r8m2s7gb45vlw-hello-2.12.1.drv' -> '/nix/store/lbx9yvrzmkjn8s892lc6hgr74d2sia7x-hello-2.12.1.drv'...
building '/nix/store/lbx9yvrzmkjn8s892lc6hgr74d2sia7x-hello-2.12.1.drv'...
Using versionCheckHook
Running phase: unpackPhase
unpacking source archive /nix/store/pa10z4ngm0g83kx9mssrqzz30s84vq7k-hello-2.12.1.tar.gz
source root is hello-2.12.1
setting SOURCE_DATE_EPOCH to timestamp 1653865426 of file hello-2.12.1/ChangeLog
Running phase: patchPhase
build failed in patchPhase with exit code 1
### WARNING ###
addTmateBreakpoint in use, ignore the sha256 warning and do not leak these build logs, otherwise unauthorized ssh access to the build sandbox may occur
To attach using tmate:

To connect to the session locally, run: tmate -S /tmp/tmate-1000/Q1zzAv attach
Connecting to ssh.tmate.io...
web session read only: https://tmate.io/t/ro-VvJLT6c7tz5k9gD9gbjrDgScx
ssh session read only: ssh [email protected]
web session: https://tmate.io/t/uxxQTYbdBQ47ECELbM4BNAgQN
ssh session: ssh [email protected]

There's is a potential issue issue that this hook can introduce new and unexpected build failure, just by the fact that it provides internet access to a build that previously didn't have it, which can limit its usefulness in certain contexts. But nonetheless, I've found it very useful in practice

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 24.11 Release Notes (or backporting 23.11 and 24.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@@ -33,4 +33,5 @@ recurseIntoAttrs {
inherit (references) samples;
};
writeTextFile = callPackage ./write-text-file.nix {};
addTmateBreakpoint = callPackage ./addTmateBreakpoint.nix {};
Copy link
Contributor

@philiptaron philiptaron Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm scared of running this test on Hydra / ofBorg / nix.ci / my own machine. Is there a way to prove that this is working that doesn't mean there could be an open connection out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could re-override the attrs to make it no longer a FOD, and maybe look for some log lines, and grep them to ensure that tmate was spawned. I'm not sure how to do this with the trivial-builders testing infrastructure though, as I don't think there's a way of doing "golden testing", i.e checking output matches an expected string. I'm not sure how to best implement this, suggestions definitely welcome.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if you could use the nixosTests infrastructure instead, and do the Nix build on a VM which was guaranteed not to have network connectivity out... then scrape the build output for the whole nixosTest derivation using something like testBuildFailure to see the appropriate incantations are added.

Even better would look like a second VM which connected to the first through the SSH connection to prove the end-to-end workflow was successful. We have a couple of those in the nixosTest world, especially in the networking tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I have the kind of time needed to implement that kind of test, and the value is unknown since tmate is not self-hosted anyway. A self-hosted tmate would be possible with the test you suggested which would be end-to-end. But yeah, I don't have the time to implement that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's super fair. I don't think I'm comfortable with having this specific test in CI (as I currently understand it) so this might need to go into the "documented but not tested" bucket.

I'm also open to learning about CI and why this is safe.

@philiptaron
Copy link
Contributor

This is really cool.

Copy link
Member

@roberth roberth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!
This will be particularly helpful with remote builders, where you'd otherwise need both ssh and nsenter.

Comment on lines 617 to 618
printf "To attach using tmate:\n\n"
${util-linux}/bin/script -qefc "${tmate}/bin/tmate -F"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe factor these two lines out so you can put an explicit breakpoint call anywhere in the derivation build script?

Copy link
Contributor Author

@MatthewCroughan MatthewCroughan Aug 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, to call the hook explicitly, you just have to place an exit 1; anywhere. For me, the point of the hook is to drop you in where a derivation fails. Are you proposing not doing that and instead making it a function you call from in the drv phase, rather than exit 1 to call the hook? Not sure what is meant by "factor these two lines out" otherwise, can you explain more?

Comment on lines +594 to +614
/*
Adds a hook to the nativeBuildInputs of a derivation which will provide an
SSH session via tmate to the build environment, for interactive
debugging

Example:

# Spawn a tmate session after the `buildPhase` of the `hello` derivation
addTmateBreakpoint (pkgs.hello.overrideAttrs { buildPhase = "exit 1;"; })
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move this into the manual?
See e.g. https://nixos.org/manual/nixpkgs/stable/index.html#breakpointhook, which would be good to link back and forth.

@MatthewCroughan
Copy link
Contributor Author

@roberth

where you'd otherwise need both ssh and nsenter.

This should also work fine on Darwin and BSDs whereas nsentr/cntr wouldn't.

@tomberek
Copy link
Contributor

Thanks for adding this. It is a potential footgun, but often helpful. One way to mitigate the security issue is to leverage tmate's cli a bit more:

tmate -a ~/.ssh/authorized_keys

So people can add the hook, but also place their public keys into an envvar or file and the hook uses it.

@philiptaron
Copy link
Contributor

@MatthewCroughan: what about adding this as passthru.breakpointHook on the tmate derivation, so that it's not in the stdenv and is more easily discoverable for those that use tmate?

@MatthewCroughan MatthewCroughan force-pushed the mc/addTmateBreakpoint branch 2 times, most recently from e9b77ba to ceb96a7 Compare August 27, 2024 15:04
Copy link
Contributor

@philiptaron philiptaron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://gist.github.com/GrahamcOfBorg/68f9912aea02d01bff0a81ae3881f138

ofBorg isn't happy with the test, and I think I'd rather have this code in nixpkgs sans test than have a test which uses it.

@MatthewCroughan MatthewCroughan force-pushed the mc/addTmateBreakpoint branch 2 times, most recently from 0f27434 to 78dfc6a Compare September 8, 2024 16:50
@ofborg ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux labels Sep 8, 2024
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't actually seem like a trivial builder to me.
I'd think it to be fine if this were in its own file in pkgs/build-support, referenced from all-packages.nix (which contains more than just packages anyway).

Maybe there's a better option I'm not thinking of. Might @infinisil have a suggestion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with Robert that this isn't a trivial builder.

I prefer hanging this off of the tmate derivation as a passthru tmate.breakpointHook, following the existing breakpointHook that uses cntr.

Copy link
Contributor

@philiptaron philiptaron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes to move this out of trivial-builders; I've made a suggestion of tmate.breakpointHook but there are many other reasonable choices.

@@ -110,7 +110,7 @@ let
inherit lib;
inherit (self) config;
inherit (self) runtimeShell stdenv stdenvNoCC;
inherit (self.pkgsBuildHost) jq shellcheck-minimal;
inherit (self.pkgsBuildHost) jq shellcheck-minimal unixtools tmate;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this change needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because in order to provide unixtools and tmate to trivial-builders.nix so that we can put the addTmateBreakpoint function in there. Though as @roberth says, trivial-builders.nix may not be the correct location for this to reside.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with Robert that this isn't a trivial builder.

I prefer hanging this off of the tmate derivation as a passthru tmate.breakpointHook, following the existing breakpointHook that uses cntr.

@MatthewCroughan MatthewCroughan force-pushed the mc/addTmateBreakpoint branch 3 times, most recently from 0ee166a to b3e3bf9 Compare November 20, 2024 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants