Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blocking queries may block longer than necessary in particular race conditions. #18266

Open
stswidwinski opened this issue Aug 20, 2023 · 2 comments

Comments

@stswidwinski
Copy link
Contributor

Nomad version

Tip.

Operating system and Environment details

Unix

Issue

The blocking query logic is written such that awaiting on a trigger may race with the trigger itself, resulting in a block which is much longer than necessary. To be exact:

https://github.com/hashicorp/nomad/blob/main/nomad/rpc.go#L832-L870

Let us consider the following lines:

  1. Line 832: background context (with timeout)
  2. Line 865: waiting on background context (or timout)
  3. Everything inbetween is "the query"

Consider the following set of events:

  1. Context is created (Line 832)
  2. In the middle of "the query" a relevant event triggers background context
  3. The query finishes and we hit Line 865

The implementation of ws.WatchCtx is such that notifications from point 2 will be ignored and will not trigger a retry. This may cause long (sometimes all the way to timeout) queries where an instant query should've been used to satisfy the query.

One should likely begin observing the context prior to the beginning of "the query" to correctly handle retries.

Reproduction steps

A simple repro is to explicitly race against state changes within "the query." Here is a simple way to achieve this with relatively high probability.

Step 1: Start the dev server

./nomad agent -dev

Step 2: Start running a tight loop of queries for a job allocations

Here is a sample script:

#!/usr/bin/bash

set -euo pipefail
set -x

NOMAD="./nomad"
TMP_DIR="/tmp/test"
mkdir -p $TMP_DIR

INDEX=1
while true; do
  # Parallel queries in hope of racing just right
  for i in {1..10}; do
    $NOMAD operator api -verbose "/v1/job/test_job-1/allocations?index=$INDEX&all=true" 2>$TMP_DIR/stderr-$i &
    pids[${i}]=$!
  done

  # wait for all pids
  for pid in ${pids[*]}; do
      wait $pid
  done

  # Next index.
  INDEX=$(cat "$TMP_DIR/stderr-10" |  grep X-Nomad-Index | sed 'sx.*: xx' |  sed 's/[^0-9].*//' || echo "1")
done

This will just forever return information and use the latest index known. This is expected to continue to return information ~in a tight loop and pretty much never block for a very long time.

Step 3: Start, Stop and GC a job

#!/usr/bin/bash

set -euo pipefail
set -x

NOMAD="./nomad"
TMP_DIR="/tmp/test"
mkdir -p $TMP_DIR

function create_job_spec () {
  echo "job \"test_job-$1\" {
    region = \"global\"
    type = \"batch\"

    group \"one_and_only\" {
      count = 10

      task \"test\" {
        driver = \"raw_exec\"

        config {
          command = \"/usr/bin/sleep\"
          args = [ \"$1\" ]
        }
      }
    }
  }"
}

# Start up a single job and then stop it and GC.
for i in {1..1}; do
  JOB_SPEC="$TMP_DIR/job-spec.hcl"
  echo "$(create_job_spec $i)" > $JOB_SPEC
  $NOMAD job run $JOB_SPEC
  $NOMAD job stop "test_job-$i"
done  

# Trip the GC
$NOMAD system gc &

Observed behaviors.

The tight loop of getting job information will "deadlock" and continue waiting on an index which is lower than the index after the GC of the job and allocations. This will continue until the timeout is hit. An exactly identical query issued (same index blocking) after the GC completes will return immediately without any problems.

If this doesn't repro for you immediately, just run Step 3 a couple of times. I can exhibit the behavior roughly 9 out of 10 times.

Expected Result

We block only as long as we need to and no longer.

Actual Result

We block potentially indefinitely.

@lgfa29
Copy link
Contributor

lgfa29 commented Aug 21, 2023

Thanks for another detailed report @stswidwinski 😄

Just to make sure I understand the problem and the reproduction correctly, in step 2 you mentioned:

This will just forever return information and use the latest index known. This is expected to continue to return information ~in a tight loop and pretty much never block for a very long time.

Is the script expected to keep printing outputs without blocking until step 3? If so, that's not what I'm seeing. All calls are blocked waiting for the index to change:

+ NOMAD=./nomad
+ TMP_DIR=/tmp/test
+ mkdir -p /tmp/test
+ INDEX=1
+ true
+ for i in '{1..10}'
+ pids[${i}]=73477
+ for i in '{1..10}'
+ ./nomad operator api -verbose '/v1/job/test_job-1/allocations?index=1&all=true'
+ pids[${i}]=73478
+ for i in '{1..10}'
+ ./nomad operator api -verbose '/v1/job/test_job-1/allocations?index=1&all=true'
+ pids[${i}]=73479
+ for i in '{1..10}'
+ ./nomad operator api -verbose '/v1/job/test_job-1/allocations?index=1&all=true'
+ pids[${i}]=73480
+ for i in '{1..10}'
+ ./nomad operator api -verbose '/v1/job/test_job-1/allocations?index=1&all=true'
+ ./nomad operator api -verbose '/v1/job/test_job-1/allocations?index=1&all=true'
+ pids[${i}]=73481
+ for i in '{1..10}'
+ pids[${i}]=73482
+ for i in '{1..10}'
+ ./nomad operator api -verbose '/v1/job/test_job-1/allocations?index=1&all=true'
+ pids[${i}]=73483
+ for i in '{1..10}'
+ ./nomad operator api -verbose '/v1/job/test_job-1/allocations?index=1&all=true'
+ pids[${i}]=73484
+ for i in '{1..10}'
+ ./nomad operator api -verbose '/v1/job/test_job-1/allocations?index=1&all=true'
+ pids[${i}]=73485
+ for i in '{1..10}'
+ ./nomad operator api -verbose '/v1/job/test_job-1/allocations?index=1&all=true'
+ pids[${i}]=73486
+ for pid in '${pids[*]}'
+ wait 73477
+ ./nomad operator api -verbose '/v1/job/test_job-1/allocations?index=1&all=true'

@stswidwinski
Copy link
Contributor Author

stswidwinski commented Aug 21, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Needs Roadmapping
Development

No branches or pull requests

3 participants