flux exec --jobid doesn't work for job owner in system instance #6548

garlick · 2025-01-10T17:27:50Z

Problem: intuitively you'd expect flux exec --jobid ID to work in the system instance when the invoking user is the same as the job owner of ID, but it does not.

The reason is noted in the code:

/* The embedded subprocess server restricts access based on FLUX_ROLE_OWNER,
 * but this shell cannot trust message credentials if they are passing through
 * a Flux instance running as a different user (e.g. the "flux" user in a
 * system instance).  If that user were compromised, they could run arbitrary
 * commands as any user that currently has a job running.  Therefore, this
 * additional check ensures that we only trust an instance running as the same
 * user.
 *
 * For good measure, check that the shell userid matches the credential
 * userid. After the above check, this could only fail in test where the
 * owner can be mocked.
 */

The method used to check if the instance is to be trusted is

    /* Determine if this shell is running as the instance owner, without
     * trusting the instance owner to tell us.  Since the parent of a guest
     * shell is flux-imp(1), kill(2) of the parent pid should fail for guests.
     */
    pid_t ppid = getppid (); // 0 =  parent is in a different pid namespace
    if (ppid > 0 && kill (getppid (), 0) == 0)
        rexec->parent_is_trusted = true;

Discussion began in

Allow tool access to Flux while running outside an allocation #6546

The text was updated successfully, but these errors were encountered:

garlick · 2025-01-10T17:40:30Z

@grondo mentioned munge as an idea for enhancing message credentials in this case.

I wonder if we could amend the subprocess protocol to allow a signing method from flux-security to be used on the exec requests?

grondo · 2025-01-13T20:30:56Z

Another use case for a signing method in libsubprocess would be to allow sysadmins to use flux exec to invoke commands as themselves or user root across the a cluster running Flux as the system resource manager. Unscientific tests have shown flux exec to be a couple of orders of magnitude faster than pdsh and clush.

I'm not sure if the same signing method could be used for the flux exec --jobid use case and the "run as root" use case, since the latter requires participation of the IMP, but thought I'd mention it here so we can keep in mind during design.

garlick · 2025-01-13T20:51:53Z

I'm not sure if the same signing method could be used for the flux exec --jobid use case and the "run as root" use case, since the latter requires participation of the IMP, but thought I'd mention it here so we can keep in mind during design.

I can't think of why not if we stick with flux_sign_wrap()? Did you have a case in mind?

If not configured --with-flux-security, we can fall back to core's sign_none_wrap() and the existing paranoid check for job owner == instance owner in the shell.

grondo · 2025-01-13T21:22:51Z

Well, I hadn't thought very far on this one, only that the IMP is a separate process while the job shell subprocess server implements the exec protocol directly. If we have some way for the IMP to extract the signature when it is invoked by libsubprocess that would be sufficient, but I'm having trouble at the moment envisioning how that would actually happen. (It differs from the job execution case where the IMP extracts J with the signature from the KVS out of band)

Edit: Thinking about it for one second longer, this would require an extension on the subprocess server side to invoke the IMP and pass the credential via a new protocol between invoker and IMP. Therefore, once the credential is available, we could open a separate issue to add a new subcommand to the IMP to accept this mode of invoking a command with a uid transition.

garlick · 2025-01-13T21:45:38Z

Ah right, I didn't think that through either! That makes perfect sense to me.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flux exec --jobid doesn't work for job owner in system instance #6548

flux exec --jobid doesn't work for job owner in system instance #6548

garlick commented Jan 10, 2025

garlick commented Jan 10, 2025

grondo commented Jan 13, 2025 •

edited

Loading

garlick commented Jan 13, 2025

grondo commented Jan 13, 2025 •

edited

Loading

garlick commented Jan 13, 2025

flux exec --jobid doesn't work for job owner in system instance #6548

flux exec --jobid doesn't work for job owner in system instance #6548

Comments

garlick commented Jan 10, 2025

garlick commented Jan 10, 2025

grondo commented Jan 13, 2025 • edited Loading

garlick commented Jan 13, 2025

grondo commented Jan 13, 2025 • edited Loading

garlick commented Jan 13, 2025

grondo commented Jan 13, 2025 •

edited

Loading

grondo commented Jan 13, 2025 •

edited

Loading