Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add support for growing allocations #1340

Draft
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

milroy
Copy link
Member

@milroy milroy commented Feb 14, 2025

This PR adds grow_allocate functionality, which attempts to allocate new resources specified by a jobspec to an existing jobid. The PR adds necessary functionality to the traverser, resource module, and REAPI, and adds testsuite tests for correct behavior.

This PR is WIP pending client testing in FluxQueue, Fluence, and additional testsuite tests.

Copy link
Member

@vsoch vsoch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🍔

goto error;
}
} else {
if (strcmp ("grow_allocate", cmd) == 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to solidify my understanding here:

  1. When there is a call to the match request callback, the only condition (match op) this is allowed for is grow_allocate
  2. In the cast that the jobid exists but it's not grow_allocate, that is an error case. Only that one supports an existing id.
  3. In the case we don't have a job id and it is grow_allocate that's another error because we require one!
  4. Shrinking an allocation is done via partial cancel (thus not in this function)

One case I'm interested to test is if it's possible to shrink down to 0 and then go back up, and also what happens if you try to grow or shrink in unreasonable ways (where those ways are TBA!).

@@ -99,7 +100,10 @@ int dfu_traverser_t::request_feasible (detail::jobmeta_t const &meta,
get_graph_db ()->metadata.graph_duration.graph_end.time_since_epoch ())
.count ();
// only the initial time matters for allocate
const int64_t target_time = op == match_op_t::MATCH_ALLOCATE ? meta.at : graph_end - 1;
const int64_t target_time =
(op == match_op_t::MATCH_ALLOCATE || op == match_op_t::MATCH_GROW_ALLOCATION)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this syntax say "if the statement is true (if it's allocate or grow) default to the meta.at, otherwise the graph end -1 (or is it the other way around)? I know this syntax exists in JavaScript and I always get it wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if that is the case, it's saying that if we grow or match allocate, we want the time to be the job metadata "at" which I assume is the time matched at? Just to check - for grow do we have a new at time for "at" set here in the job metadata (otherwise it would be the previous at associated with the jobid when the first allocation was done)?

@@ -118,7 +122,8 @@ int dfu_traverser_t::request_feasible (detail::jobmeta_t const &meta,
}
if (feasible_nodes < target_nodes) {
// no chance, don't even try
if (op == match_op_t::MATCH_ALLOCATE_ORELSE_RESERVE || op == match_op_t::MATCH_ALLOCATE) {
if (op == match_op_t::MATCH_ALLOCATE_ORELSE_RESERVE || op == match_op_t::MATCH_ALLOCATE
|| op == match_op_t::MATCH_GROW_ALLOCATION) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EEEEEEEEEEEEBUSY continues doing nodey things ⌨️

@@ -217,6 +222,7 @@ int dfu_traverser_t::schedule (Jobspec::Jobspec &jobspec,
break;
}
case match_op_t::MATCH_ALLOCATE:
case match_op_t::MATCH_GROW_ALLOCATION:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand this syntax - does this mean do nothing for match allocate, or that match allocate is the same as match grow allocate? I would guess the latter since one line was just added (so we want EBUSY for both cases).

In that case, this is saying if no resources are available, just return busy. I think this answers my question about reserve - we can't reserve with an update. It's a yes / no, and right now, if we don't have it ask again later!

} else {
jobid = static_cast<int64_t> (std::strtoll (args[3].c_str (), NULL, 10));
if (!ctx->jobs.contains (jobid)) {
std::cerr << "ERROR: nonexistent jobid " << jobid << std::endl;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is (I think) a third check for existence - I am guessing this is done because the state might change, and better to be safe than sorry.

Copy link

codecov bot commented Feb 14, 2025

Codecov Report

Attention: Patch coverage is 49.12281% with 29 lines in your changes missing coverage. Please review.

Project coverage is 75.6%. Comparing base (da6156a) to head (b64bc45).

Files with missing lines Patch % Lines
resource/modules/resource_match.cpp 20.0% 16 Missing ⚠️
resource/reapi/bindings/c++/reapi_cli_impl.hpp 16.6% 5 Missing ⚠️
resource/policies/base/match_op.h 25.0% 3 Missing ⚠️
resource/traversers/dfu.cpp 62.5% 3 Missing ⚠️
resource/utilities/command.cpp 92.3% 1 Missing ⚠️
resource/utilities/rq2.cpp 83.3% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##           master   #1340     +/-   ##
========================================
- Coverage    75.7%   75.6%   -0.1%     
========================================
  Files         112     112             
  Lines       16410   16450     +40     
========================================
+ Hits        12426   12444     +18     
- Misses       3984    4006     +22     
Files with missing lines Coverage Δ
resource/utilities/command.cpp 79.2% <92.3%> (+0.1%) ⬆️
resource/utilities/rq2.cpp 56.3% <83.3%> (+0.2%) ⬆️
resource/policies/base/match_op.h 52.9% <25.0%> (-11.4%) ⬇️
resource/traversers/dfu.cpp 86.3% <62.5%> (-0.5%) ⬇️
resource/reapi/bindings/c++/reapi_cli_impl.hpp 39.2% <16.6%> (-0.4%) ⬇️
resource/modules/resource_match.cpp 70.3% <20.0%> (-0.4%) ⬇️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants