Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LibOS] Delay IPC leader notification until it has 0 connections #1588

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

stefanberger
Copy link
Contributor

@stefanberger stefanberger commented Oct 3, 2023

To avoid killing the IPC leader before all the child processes have died, delay notifying the leader that it should terminate until it has 0 connections.

I think this could fix the following issues (famous last words?)
Fixes: Issue #21
Fixes: Issue #1519
Fixes: Issue #1520

However, I know that this patch may trigger a lot more of these types of messages here:

(libos_ipc_child.c:23:ipc_child_disconnect_callback) [P1:libos] debug: Unknown process (vmid: 0x2) disconnected

However, these message with 'vmid' are following this type of message with 'pid: 2' and are harmless from what I can see. (matching pid 2 to vmid 2!)

(libos_ipc_child.c:63:ipc_cld_exit_callback) [P1:libos] debug: Child process (pid: 2) died

I think these messages are the lesser evil and could possibly be removed from the code.

How to test this PR?

Use the test cases provided in #1520 or #1519.


This change is Reviewable

@stefanberger stefanberger marked this pull request as draft October 3, 2023 20:04
@stefanberger stefanberger force-pushed the stefanberger/delay_ipc_leader_until_0_connections branch from 997c2f5 to c8b04ab Compare October 3, 2023 21:16
@stefanberger stefanberger changed the title WIP: [LibOS] Delay IPC leader notification until it has 0 connections [LibOS] Delay IPC leader notification until it has 0 connections Oct 4, 2023
@stefanberger stefanberger marked this pull request as ready for review October 4, 2023 13:29
@stefanberger stefanberger force-pushed the stefanberger/delay_ipc_leader_until_0_connections branch from df0b192 to 7e75781 Compare October 4, 2023 13:43
@mkow mkow requested a review from boryspoplawski October 5, 2023 07:45
Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski and @stefanberger)


libos/src/ipc/libos_ipc_worker.c line 112 at r1 (raw file):

static void del_ipc_connection(struct libos_ipc_connection* conn,
                               PAL_HANDLE *notifier) {

ditto for the whole PR

Suggestion:

PAL_HANDLE* notifier

Copy link
Contributor

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 2 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @stefanberger)


libos/src/ipc/libos_ipc_worker.c line 440 at r1 (raw file):

        while (__atomic_load_n(&g_ipc_connections_cnt, __ATOMIC_ACQUIRE) > 0) {
            PalEventWait(leader_notifier, &timeout_us);
            if (!__atomic_load_n(&g_clear_on_worker_exit, __ATOMIC_ACQUIRE))

How can this if ever happen? Worker thread is informed that it should exit only 4 lines below (set_pollable_event)

Copy link
Contributor Author

@stefanberger stefanberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 2 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski and @mkow)


libos/src/ipc/libos_ipc_worker.c line 112 at r1 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto for the whole PR

Done.


libos/src/ipc/libos_ipc_worker.c line 440 at r1 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

How can this if ever happen? Worker thread is informed that it should exit only 4 lines below (set_pollable_event)

What is 'this' -- you mean 'this' is 'the problem' that this patch is solving? It can happen when the initial process (pid 1) exits before its children.

Copy link
Contributor

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 2 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @mkow and @stefanberger)


libos/src/ipc/libos_ipc_worker.c line 440 at r1 (raw file):

Previously, stefanberger (Stefan Berger) wrote…

What is 'this' -- you mean 'this' is 'the problem' that this patch is solving? It can happen when the initial process (pid 1) exits before its children.

I meant how can this if statement ever be true

@stefanberger stefanberger force-pushed the stefanberger/delay_ipc_leader_until_0_connections branch from ec79e22 to a995276 Compare October 5, 2023 12:54
Copy link
Contributor Author

@stefanberger stefanberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 2 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski and @mkow)


libos/src/ipc/libos_ipc_worker.c line 440 at r1 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

I meant how can this if statement ever be true

Good point. Removed.

@stefanberger stefanberger force-pushed the stefanberger/delay_ipc_leader_until_0_connections branch from ce8d313 to a3427ee Compare October 5, 2023 13:28
Copy link
Contributor Author

@stefanberger stefanberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The iret_emulation test had caused and issue that I had not seen before. I now squashed the patches (to better see the code) and fixed this by initializing the notifier handle earlier.

Reviewable status: 0 of 1 files reviewed, 2 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @boryspoplawski and @mkow)

Copy link
Contributor Author

@stefanberger stefanberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now there's an issue in ltp test sched_setaffinity01 where one child is stuck in the pause() system call. This is a test case that is not expected to work per ltp.cfg but we need it to work for the number of ipc clients to decrease to zero.

Reviewable status: 0 of 1 files reviewed, 2 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @boryspoplawski and @mkow)

Copy link
Contributor Author

@stefanberger stefanberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now there's an issue in ltp test sched_setaffinity01 where one child is stuck in the pause() system call. This is a test case that is not expected to work per ltp.cfg but we need it to work for the number of ipc clients to decrease to zero.

An implementation of setresuid() supporting setting euid would resolve the issue.

Reviewable status: 0 of 1 files reviewed, 2 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @boryspoplawski and @mkow)

Copy link
Contributor Author

@stefanberger stefanberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is blocked by PR #1590 implementing setresuid() support.

Reviewable status: 0 of 1 files reviewed, 2 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @boryspoplawski and @mkow)

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r4, all commit messages.
Reviewable status: all files reviewed, 13 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @mkow, and @stefanberger)

a discussion (no related file):
What happens when the leader process (P0) is sent a SIGTERM (request to terminate) while its child process (P1) is still running?

In normal Linux, P0 should terminate and P1 will terminate with it.

In your PR, it looks like SIGTERM will be ignored? Or, more specifically, P0 will terminate everything except the IPC helper thread. So we'll get a weird state when SIGTERM was partially ignored, and Gramine still runs even though the user asked it to die.

(To test this scenario, you need to specify sys.enable_sigterm_injection = true in the manifest. Otherwise Gramine processes will not react to SIGTERM at all.)



libos/src/ipc/libos_ipc_worker.c line 68 at r4 (raw file):

};

static PAL_HANDLE leader_notifier;

All global-scope variables must be prepended with g_. This includes even static keyword variables.


libos/src/ipc/libos_ipc_worker.c line 77 at r4 (raw file):

}

static inline int is_ipc_leader(void) {

bool


libos/src/ipc/libos_ipc_worker.c line 116 at r4 (raw file):

static void del_ipc_connection(struct libos_ipc_connection* conn,
                               PAL_HANDLE* notifier) {

Why do you need this notifier argument?

  • You are either the leader (is_ipc_leader() == true), then you know that the event is g_leader_notifier
  • Or you are the child (is_ipc_leader() == false), then you know that there's no event so you never call PalEventSet()

libos/src/ipc/libos_ipc_worker.c line 125 at r4 (raw file):

    if (notifier && g_ipc_connections_cnt == 0)
        PalEventSet(*notifier);

This feels wrong. What prevents the compiler/CPU to reorder PalEventSet() and g_ipc_connections_cnt? I mean, this func can execute on CPU0 and the waiting func (terminate_ipc_worker()) can execute on CPU1. Isn't it possible that CPU1 will notice the event and wake up, but won't notice g_ipc_connections_cnt being updated?

I feel like accessing g_ipc_connections_cnt should be either made atomic or surrounded by a spinlock. I would prefer a spinlock.


libos/src/ipc/libos_ipc_worker.c line 226 at r4 (raw file):

}

static noreturn void ipc_worker_main(PAL_HANDLE* notifier) {

ditto (no need for notifier arg)


libos/src/ipc/libos_ipc_worker.c line 373 at r4 (raw file):

        g_ipc_connections_cnt = 0;
        PalEventSet(*notifier);
    }

What's the point in these lines? We'll perform PalProcessExit(1) anyway, which is basically "die process, die immediately, I don't care about anything".


libos/src/ipc/libos_ipc_worker.c line 389 at r4 (raw file):

    log_debug("IPC worker started");

    PAL_HANDLE* notifier = is_ipc_leader() ? &leader_notifier : NULL;

This line won't be needed if you'll just do it more explicitly, as I described in another comment.


libos/src/ipc/libos_ipc_worker.c line 408 at r4 (raw file):

    /* IPC leader gets a notifier used in terminate_ipc_leader */
    if (is_ipc_leader() &&
        PalEventCreate(&leader_notifier, false, false) < 0) {

Can you specify what these false args are? Using the pattern:

PalEventCreate(&leader_notifier, /*argname=*/false, /*argname=*/false);

libos/src/ipc/libos_ipc_worker.c line 410 at r4 (raw file):

        PalEventCreate(&leader_notifier, false, false) < 0) {
        log_error("IPC leader: PalEventCreate() failed");
        return -ENOMEM;

This should actually be the result of PalEventCreate() converted from PAL error code to UNIX error code, not a hardcoded ENOMEM.


libos/src/ipc/libos_ipc_worker.c line 437 at r4 (raw file):

void terminate_ipc_worker(void) {
    if (is_ipc_leader()) {
        PalEventClear(leader_notifier);

Why is this line needed?

Copy link
Contributor Author

@stefanberger stefanberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 13 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, and @mkow)


libos/src/ipc/libos_ipc_worker.c line 68 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

All global-scope variables must be prepended with g_. This includes even static keyword variables.

Done.


libos/src/ipc/libos_ipc_worker.c line 77 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

bool

Done.


libos/src/ipc/libos_ipc_worker.c line 116 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Why do you need this notifier argument?

  • You are either the leader (is_ipc_leader() == true), then you know that the event is g_leader_notifier
  • Or you are the child (is_ipc_leader() == false), then you know that there's no event so you never call PalEventSet()

Done.


libos/src/ipc/libos_ipc_worker.c line 125 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

This feels wrong. What prevents the compiler/CPU to reorder PalEventSet() and g_ipc_connections_cnt? I mean, this func can execute on CPU0 and the waiting func (terminate_ipc_worker()) can execute on CPU1. Isn't it possible that CPU1 will notice the event and wake up, but won't notice g_ipc_connections_cnt being updated?

I feel like accessing g_ipc_connections_cnt should be either made atomic or surrounded by a spinlock. I would prefer a spinlock.

g_ipc_connections_cnt is incremented/decremented by add/del_ipc_connection, which are both called by the ipc_worker_main(). Besides that it's read then only in terminate_ipc_worker where I read it like this:

    if (is_ipc_leader()) {
        PalEventClear(g_leader_notifier);
        while (__atomic_load_n(&g_ipc_connections_cnt, __ATOMIC_ACQUIRE) > 0 &&
               PalEventWait(g_leader_notifier, NULL) < 0);
    }
    set_pollable_event(&g_worker_thread->pollable_event);

So I thought that making it atomic or surrounding it with a spinlock wouldn't help much.


libos/src/ipc/libos_ipc_worker.c line 226 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

ditto (no need for notifier arg)

Done.


libos/src/ipc/libos_ipc_worker.c line 373 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

What's the point in these lines? We'll perform PalProcessExit(1) anyway, which is basically "die process, die immediately, I don't care about anything".

Done. Removed.


libos/src/ipc/libos_ipc_worker.c line 389 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

This line won't be needed if you'll just do it more explicitly, as I described in another comment.

Done.


libos/src/ipc/libos_ipc_worker.c line 410 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

This should actually be the result of PalEventCreate() converted from PAL error code to UNIX error code, not a hardcoded ENOMEM.

Done.


libos/src/ipc/libos_ipc_worker.c line 437 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Why is this line needed?

In case the g_ipc_connections_cnt had dropped down to 0 before at some point, and the event was set agt that point, but now g_ipc_connection_cnt is > 0.

@stefanberger stefanberger force-pushed the stefanberger/delay_ipc_leader_until_0_connections branch from 284118e to 32f8efa Compare January 8, 2024 16:13
Copy link
Contributor Author

@stefanberger stefanberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 13 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, and @mkow)


libos/src/ipc/libos_ipc_worker.c line 408 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Can you specify what these false args are? Using the pattern:

PalEventCreate(&leader_notifier, /*argname=*/false, /*argname=*/false);

Done.

@stefanberger stefanberger force-pushed the stefanberger/delay_ipc_leader_until_0_connections branch from 32f8efa to 6a91b33 Compare January 8, 2024 17:03
@stefanberger stefanberger force-pushed the stefanberger/delay_ipc_leader_until_0_connections branch from 6a91b33 to 244854e Compare January 16, 2024 01:27
@stefanberger
Copy link
Contributor Author

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

What happens when the leader process (P0) is sent a SIGTERM (request to terminate) while its child process (P1) is still running?

In normal Linux, P0 should terminate and P1 will terminate with it.

In your PR, it looks like SIGTERM will be ignored? Or, more specifically, P0 will terminate everything except the IPC helper thread. So we'll get a weird state when SIGTERM was partially ignored, and Gramine still runs even though the user asked it to die.

(To test this scenario, you need to specify sys.enable_sigterm_injection = true in the manifest. Otherwise Gramine processes will not react to SIGTERM at all.)

Thanks, I did not know about this case. I now added a force parameter to terminate_ipc_worker so that the IPC leader can be forced to be terminated rather than having it wait for 0 connections. It now reacts to SIGTERM sent to the 'root' process. I tested this with bash and 2 child bash'es and it behaves as before in this case.

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r5, all commit messages.
Reviewable status: all files reviewed, 14 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @boryspoplawski, @mkow, and @stefanberger)


-- commits line 13 at r5:
We don't use the term "root", we just use the term "main process" or "leader process". I'd prefer "main" here.


-- commits line 14 at r5:
The "leader" term applies to the process. We have one "main" process who is the parent of all "child" processes. We use the term "leader" to denote that the main process decides on some global-state events on behalf of its child processes.

Next, there is an "IPC worker/helper thread", which is a thread in each of the processes. There is no such thing as the "IPC leader", because this would imply "a leader among threads of the same process", which is incorrect.

So, the correct phrasing is "to terminate the IPC worker thread of the leader process". I would like this more explicit sentence rather than a vague "IPC leader".


libos/include/libos_ipc.h line 67 at r5 (raw file):

 * \brief Terminate the IPC worker thread.
 *
 * \param force  Whether to force the termination even if the IPC thread is the leader

Why not simply Whether to force the termination of the IPC thread

Then a curious reader may learn why we don't always force the termination by looking at the source code of this function.


libos/src/ipc/libos_ipc_worker.c line 125 at r4 (raw file):

Previously, stefanberger (Stefan Berger) wrote…

g_ipc_connections_cnt is incremented/decremented by add/del_ipc_connection, which are both called by the ipc_worker_main(). Besides that it's read then only in terminate_ipc_worker where I read it like this:

    if (is_ipc_leader()) {
        PalEventClear(g_leader_notifier);
        while (__atomic_load_n(&g_ipc_connections_cnt, __ATOMIC_ACQUIRE) > 0 &&
               PalEventWait(g_leader_notifier, NULL) < 0);
    }
    set_pollable_event(&g_worker_thread->pollable_event);

So I thought that making it atomic or surrounding it with a spinlock wouldn't help much.

TODO(myself): think about it more.


libos/src/ipc/libos_ipc_worker.c line 437 at r4 (raw file):

Previously, stefanberger (Stefan Berger) wrote…

In case the g_ipc_connections_cnt had dropped down to 0 before at some point, and the event was set agt that point, but now g_ipc_connection_cnt is > 0.

TODO(myself): think about it more.


libos/src/ipc/libos_ipc_worker.c line 401 at r5 (raw file):

    /* IPC leader gets a notifier used in terminate_ipc_leader */
    if (is_ipc_leader() &&
        (ret = PalEventCreate(&g_leader_notifier, /*init_signaled=*/false, /*auto_clear=*/false)) < 0) {

Sorry, this now looks ugly. Can we make it more readable by splitting into two embedded IFs?


libos/src/ipc/libos_ipc_worker.c line 403 at r5 (raw file):

        (ret = PalEventCreate(&g_leader_notifier, /*init_signaled=*/false, /*auto_clear=*/false)) < 0) {
        log_error("IPC leader: PalEventCreate() failed");
        return pal_to_unix_errno(ret);;

Double ;; typo


libos/src/ipc/libos_ipc_worker.c line 429 at r5 (raw file):

/* Terminate the IPC worker. If 'force' is set, the IPC leader will not wait
 * until it has 0 connections but terminate immediatel, otherwise it will

immediately typo


libos/src/ipc/libos_ipc_worker.c line 430 at r5 (raw file):

/* Terminate the IPC worker. If 'force' is set, the IPC leader will not wait
 * until it has 0 connections but terminate immediatel, otherwise it will
 * wait until all child processes have died and there are 0 connections.

I would also mention that the force case is used when the user sends an explicit SIGTERM signal to the Gramine process.


libos/src/ipc/libos_ipc_worker.c line 431 at r5 (raw file):

 * until it has 0 connections but terminate immediatel, otherwise it will
 * wait until all child processes have died and there are 0 connections.
 */

We have a 100 chars per line limit, not 80


libos/src/sys/libos_exit.c line 25 at r5 (raw file):

     * 1) kill all other Gramine processes
     * 2) wait for them to exit here, before we terminate the IPC helper
     */

Don't you resolve this TODO comment (as described in option 2)?

Please remove this comment and instead add a comment before terminate_ipc_worker() call below, explaining the option 2.


libos/src/sys/libos_exit.c line 150 at r5 (raw file):

    /* At this point other threads might be still in the middle of an exit routine, but we don't
     * care since the below will call `exit_group` eventually. */
    libos_clean_and_exit(term_signal ? 128 + (term_signal & ~__WCOREDUMP_BIT) : error_code, term_signal);

Please split into two lines; you're exceeding the 100 chars per line limit.

@stefanberger stefanberger force-pushed the stefanberger/delay_ipc_leader_until_0_connections branch 2 times, most recently from 2070a5c to ed7eb2f Compare January 19, 2024 13:36
Copy link
Contributor Author

@stefanberger stefanberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 3 files reviewed, 14 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, and @mkow)


-- commits line 13 at r5:

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

We don't use the term "root", we just use the term "main process" or "leader process". I'd prefer "main" here.

Done.


-- commits line 14 at r5:

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

The "leader" term applies to the process. We have one "main" process who is the parent of all "child" processes. We use the term "leader" to denote that the main process decides on some global-state events on behalf of its child processes.

Next, there is an "IPC worker/helper thread", which is a thread in each of the processes. There is no such thing as the "IPC leader", because this would imply "a leader among threads of the same process", which is incorrect.

So, the correct phrasing is "to terminate the IPC worker thread of the leader process". I would like this more explicit sentence rather than a vague "IPC leader".

Done.


libos/include/libos_ipc.h line 67 at r5 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Why not simply Whether to force the termination of the IPC thread

Then a curious reader may learn why we don't always force the termination by looking at the source code of this function.

Done.


libos/src/ipc/libos_ipc_worker.c line 125 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

TODO(myself): think about it more.

It is possible that a new child could be spawned right after the number of child processes has dropped to 0. Though in this case I would say we can switch to a forceful termination -- at least that's how it's going to look like.


libos/src/ipc/libos_ipc_worker.c line 437 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

TODO(myself): think about it more.

Done.


libos/src/ipc/libos_ipc_worker.c line 401 at r5 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Sorry, this now looks ugly. Can we make it more readable by splitting into two embedded IFs?

Done.


libos/src/ipc/libos_ipc_worker.c line 403 at r5 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Double ;; typo

Done. oopsy


libos/src/ipc/libos_ipc_worker.c line 429 at r5 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

immediately typo

Done.


libos/src/ipc/libos_ipc_worker.c line 430 at r5 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I would also mention that the force case is used when the user sends an explicit SIGTERM signal to the Gramine process.

Done.


libos/src/ipc/libos_ipc_worker.c line 431 at r5 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

We have a 100 chars per line limit, not 80

Done.


libos/src/sys/libos_exit.c line 25 at r5 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Don't you resolve this TODO comment (as described in option 2)?

Please remove this comment and instead add a comment before terminate_ipc_worker() call below, explaining the option 2.

Done.

Copy link
Contributor Author

@stefanberger stefanberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 3 files reviewed, 14 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, and @mkow)


libos/src/sys/libos_exit.c line 150 at r5 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Please split into two lines; you're exceeding the 100 chars per line limit.

Done.

To avoid killing the IPC leader before all the child processes
have died, delay notifying the leader that it should terminate
until it has 0 connections.

Signed-off-by: Stefan Berger <[email protected]>
…eived

When the leader process receives a SIGTERM signal, allow it to forcefully
terminate the IPC worker thread of the leader process so that the process
can exit and all child processes can then also exit.

Signed-off-by: Stefan Berger <[email protected]>
@stefanberger stefanberger force-pushed the stefanberger/delay_ipc_leader_until_0_connections branch from 5fbfcf2 to a83d180 Compare February 27, 2024 02:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants