Blocking queries may block longer than necessary in particular race conditions. #18266

stswidwinski · 2023-08-20T16:17:43Z

Nomad version

Tip.

Operating system and Environment details

Unix

Issue

The blocking query logic is written such that awaiting on a trigger may race with the trigger itself, resulting in a block which is much longer than necessary. To be exact:

https://github.com/hashicorp/nomad/blob/main/nomad/rpc.go#L832-L870

Let us consider the following lines:

Line 832: background context (with timeout)
Line 865: waiting on background context (or timout)
Everything inbetween is "the query"

Consider the following set of events:

Context is created (Line 832)
In the middle of "the query" a relevant event triggers background context
The query finishes and we hit Line 865

The implementation of ws.WatchCtx is such that notifications from point 2 will be ignored and will not trigger a retry. This may cause long (sometimes all the way to timeout) queries where an instant query should've been used to satisfy the query.

One should likely begin observing the context prior to the beginning of "the query" to correctly handle retries.

Reproduction steps

A simple repro is to explicitly race against state changes within "the query." Here is a simple way to achieve this with relatively high probability.

Step 1: Start the dev server

./nomad agent -dev

Step 2: Start running a tight loop of queries for a job allocations

Here is a sample script:

#!/usr/bin/bash

set -euo pipefail
set -x

NOMAD="./nomad"
TMP_DIR="/tmp/test"
mkdir -p $TMP_DIR

INDEX=1
while true; do
  # Parallel queries in hope of racing just right
  for i in {1..10}; do
    $NOMAD operator api -verbose "/v1/job/test_job-1/allocations?index=$INDEX&all=true" 2>$TMP_DIR/stderr-$i &
    pids[${i}]=$!
  done

  # wait for all pids
  for pid in ${pids[*]}; do
      wait $pid
  done

  # Next index.
  INDEX=$(cat "$TMP_DIR/stderr-10" |  grep X-Nomad-Index | sed 'sx.*: xx' |  sed 's/[^0-9].*//' || echo "1")
done

This will just forever return information and use the latest index known. This is expected to continue to return information ~in a tight loop and pretty much never block for a very long time.

Step 3: Start, Stop and GC a job

#!/usr/bin/bash

set -euo pipefail
set -x

NOMAD="./nomad"
TMP_DIR="/tmp/test"
mkdir -p $TMP_DIR

function create_job_spec () {
  echo "job \"test_job-$1\" {
    region = \"global\"
    type = \"batch\"

    group \"one_and_only\" {
      count = 10

      task \"test\" {
        driver = \"raw_exec\"

        config {
          command = \"/usr/bin/sleep\"
          args = [ \"$1\" ]
        }
      }
    }
  }"
}

# Start up a single job and then stop it and GC.
for i in {1..1}; do
  JOB_SPEC="$TMP_DIR/job-spec.hcl"
  echo "$(create_job_spec $i)" > $JOB_SPEC
  $NOMAD job run $JOB_SPEC
  $NOMAD job stop "test_job-$i"
done  

# Trip the GC
$NOMAD system gc &

Observed behaviors.

The tight loop of getting job information will "deadlock" and continue waiting on an index which is lower than the index after the GC of the job and allocations. This will continue until the timeout is hit. An exactly identical query issued (same index blocking) after the GC completes will return immediately without any problems.

If this doesn't repro for you immediately, just run Step 3 a couple of times. I can exhibit the behavior roughly 9 out of 10 times.

Expected Result

We block only as long as we need to and no longer.

Actual Result

We block potentially indefinitely.

The text was updated successfully, but these errors were encountered:

lgfa29 · 2023-08-21T22:31:17Z

Thanks for another detailed report @stswidwinski 😄

Just to make sure I understand the problem and the reproduction correctly, in step 2 you mentioned:

This will just forever return information and use the latest index known. This is expected to continue to return information ~in a tight loop and pretty much never block for a very long time.

Is the script expected to keep printing outputs without blocking until step 3? If so, that's not what I'm seeing. All calls are blocked waiting for the index to change:

+ NOMAD=./nomad
+ TMP_DIR=/tmp/test
+ mkdir -p /tmp/test
+ INDEX=1
+ true
+ for i in '{1..10}'
+ pids[${i}]=73477
+ for i in '{1..10}'
+ ./nomad operator api -verbose '/v1/job/test_job-1/allocations?index=1&all=true'
+ pids[${i}]=73478
+ for i in '{1..10}'
+ ./nomad operator api -verbose '/v1/job/test_job-1/allocations?index=1&all=true'
+ pids[${i}]=73479
+ for i in '{1..10}'
+ ./nomad operator api -verbose '/v1/job/test_job-1/allocations?index=1&all=true'
+ pids[${i}]=73480
+ for i in '{1..10}'
+ ./nomad operator api -verbose '/v1/job/test_job-1/allocations?index=1&all=true'
+ ./nomad operator api -verbose '/v1/job/test_job-1/allocations?index=1&all=true'
+ pids[${i}]=73481
+ for i in '{1..10}'
+ pids[${i}]=73482
+ for i in '{1..10}'
+ ./nomad operator api -verbose '/v1/job/test_job-1/allocations?index=1&all=true'
+ pids[${i}]=73483
+ for i in '{1..10}'
+ ./nomad operator api -verbose '/v1/job/test_job-1/allocations?index=1&all=true'
+ pids[${i}]=73484
+ for i in '{1..10}'
+ ./nomad operator api -verbose '/v1/job/test_job-1/allocations?index=1&all=true'
+ pids[${i}]=73485
+ for i in '{1..10}'
+ ./nomad operator api -verbose '/v1/job/test_job-1/allocations?index=1&all=true'
+ pids[${i}]=73486
+ for pid in '${pids[*]}'
+ wait 73477
+ ./nomad operator api -verbose '/v1/job/test_job-1/allocations?index=1&all=true'

stswidwinski · 2023-08-21T22:38:25Z

I believe that this is expected since index 1 is somewhat special and an entirely clean cluster will work differently due to this special case: https://github.com/hashicorp/nomad/blob/main/nomad/job_endpoint.go#L1560 Sorry, I assumed that there was at least one transaction before this is ran -- mea culpa!

…

On Mon, Aug 21, 2023, 6:31 PM Luiz Aoqui ***@***.***> wrote: Thanks for another detailed report @stswidwinski <https://github.com/stswidwinski> 😄 Just to make sure I understand the problem and the reproduction correctly, in step 2 you mentioned: This will just forever return information and use the latest index known. This is expected to continue to return information ~in a tight loop and pretty much never block for a very long time. Is the script expected to keep printing outputs without blocking until step 3? If so, that's not what I'm seeing. All calls are blocked waiting for the index to change: + NOMAD=./nomad + TMP_DIR=/tmp/test + mkdir -p /tmp/test + INDEX=1 + true + for i in '{1..10}' + pids[${i}]=73477 + for i in '{1..10}' + ./nomad operator api -verbose '/v1/job/test_job-1/allocations?index=1&all=true' + pids[${i}]=73478 + for i in '{1..10}' + ./nomad operator api -verbose '/v1/job/test_job-1/allocations?index=1&all=true' + pids[${i}]=73479 + for i in '{1..10}' + ./nomad operator api -verbose '/v1/job/test_job-1/allocations?index=1&all=true' + pids[${i}]=73480 + for i in '{1..10}' + ./nomad operator api -verbose '/v1/job/test_job-1/allocations?index=1&all=true' + ./nomad operator api -verbose '/v1/job/test_job-1/allocations?index=1&all=true' + pids[${i}]=73481 + for i in '{1..10}' + pids[${i}]=73482 + for i in '{1..10}' + ./nomad operator api -verbose '/v1/job/test_job-1/allocations?index=1&all=true' + pids[${i}]=73483 + for i in '{1..10}' + ./nomad operator api -verbose '/v1/job/test_job-1/allocations?index=1&all=true' + pids[${i}]=73484 + for i in '{1..10}' + ./nomad operator api -verbose '/v1/job/test_job-1/allocations?index=1&all=true' + pids[${i}]=73485 + for i in '{1..10}' + ./nomad operator api -verbose '/v1/job/test_job-1/allocations?index=1&all=true' + pids[${i}]=73486 + for pid in '${pids[*]}' + wait 73477 + ./nomad operator api -verbose '/v1/job/test_job-1/allocations?index=1&all=true' — Reply to this email directly, view it on GitHub <#18266 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AECDANGDK5MUWNCCSHY6SD3XWPOUDANCNFSM6AAAAAA3XL7TBA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

stswidwinski added the type/bug label Aug 20, 2023

stswidwinski mentioned this issue Aug 20, 2023

Nomad Server may instruct Clients to erroneously stop and GC all of their allocations. #18267

Open

jrasell assigned lgfa29 Sep 5, 2023

stswidwinski mentioned this issue Sep 27, 2023

client: prevent using stale allocs #18601

Merged

tgross added this to Nomad - Community Issues Triage Jun 24, 2024

tgross moved this to Triaging in Nomad - Community Issues Triage Jun 24, 2024

tgross unassigned lgfa29 Jun 24, 2024

tgross added the stage/needs-investigation label Jun 24, 2024

tgross moved this from Triaging to Needs Roadmapping in Nomad - Community Issues Triage Jun 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blocking queries may block longer than necessary in particular race conditions. #18266

Blocking queries may block longer than necessary in particular race conditions. #18266

stswidwinski commented Aug 20, 2023

lgfa29 commented Aug 21, 2023

stswidwinski commented Aug 21, 2023 via email

Blocking queries may block longer than necessary in particular race conditions. #18266

Blocking queries may block longer than necessary in particular race conditions. #18266

Comments

stswidwinski commented Aug 20, 2023

Nomad version

Operating system and Environment details

Issue

Reproduction steps

Expected Result

Actual Result

lgfa29 commented Aug 21, 2023

stswidwinski commented Aug 21, 2023 via email