libflux: add flux_send_new() #5499

garlick · 2023-10-12T21:15:17Z

Problem: it's a common pattern to create a message, send it with with flux_send(), then destroy it; however, there is no interface to allow the message ownership to be transferred rather than copied.

Add flux_send_new() which accepts a non-const flux_msg_t ** and implement it in the interthread connector. Now messages can pass from one end to the other without costly duplication.

Use this function in the common flux_rpc() and flux_respond() family of functions.

This is marked WIP for now while I consider whether it's possible to use this in the broker along the message path. It would be neat if rpcs between broker modules could transit the broker without copying!

Even just this change does seem to have a positive impact on job throughput.

Problem: some older rpc and response handling code does not conform to modern project norms. Make the following changes - break long compound conditinals to one per line - break long parameter lists to one per line

garlick · 2023-10-13T00:10:09Z

I corrected problem where I was updating a message reference count outside of the lock when it should have been inside.
Also added some unit tests.

Dropping the WIP since I think any work in the broker is going to need to be its own PR.

garlick · 2023-10-13T13:55:19Z

Looks like caliper is always enabled in CI. When caliper is enabled, flux_send_new() was falling back to flux_send(), which caused one of the unit tests to fail and would have prevented coverage of the new code.

To address this I just moved the call to profiling_msg_snapshot() to before the send call so it could be used in flux_send_new() without a use-after-free. I think this is OK. If the send fails, it means caliper might see an RPC request without a response, but of course requests and responses are not evenly matched in flux anyway...

Problem: it's a common pattern to create a message, send it with with flux_send(), then destroy it; however, there is no interface to allow flux_send() to just take over the message and thereby avoid a costly message copy if/when possible. Add flux_send_new() which accepts a non-const flux_msg_t **. The reference count must be 1, and on success *msg is set to NULL, so there should be no possibility of the message being reused. This just calls flux_send() internally if the connector does not implement op->send_new(), or if other optional things in the environment make calling op->send_new() unworkable (such as RPC tracking which wants to temporarily hold copy of each request until a response is received). Move the call to profiling_msg_snapshot() to before the send rather than after so send_new() doesn't have to fall back to send() when flux is built with caliper profiling.

Problem: several well traveled code paths in libflux call flux_send() then immediately destroy the message, thereby not communicating that the message need not be copied before modification. Call flux_send_new() in rpc, response, and logging code.

grondo · 2023-10-13T14:33:23Z

When caliper is enabled, flux_send_new() was falling back to flux_send(),

Oh, I was wondering why the function always fell back to flux_send() if HAVE_CALIPER was enabled. Just out of curiosity, why was that?

garlick · 2023-10-13T15:01:51Z

The call to profiling_msg_snapshot() was being called after the send, and in the send_new case, the message is no longer available after the send. My realization was that it probably doesn't need to be called after the send.

codecov · 2023-10-13T15:08:56Z

Codecov Report

Merging #5499 (4898293) into master (cbd871d) will decrease coverage by 0.03%.
The diff coverage is 91.80%.

@@            Coverage Diff             @@
##           master    #5499      +/-   ##
==========================================
- Coverage   83.69%   83.66%   -0.03%     
==========================================
  Files         484      484              
  Lines       81561    81537      -24     
==========================================
- Hits        68261    68218      -43     
- Misses      13300    13319      +19

Files	Coverage Δ
src/broker/module.c	`78.34% <ø> (-0.14%)`	⬇️
src/common/libflux/rpc.c	`90.99% <100.00%> (+0.12%)`	⬆️
src/common/libflux/flog.c	`84.21% <66.66%> (-1.88%)`	⬇️
src/common/libflux/connector_interthread.c	`87.75% <88.23%> (+0.57%)`	⬆️
src/common/libflux/handle.c	`86.48% <86.36%> (-0.08%)`	⬇️
src/common/libflux/response.c	`79.18% <90.69%> (-0.38%)`	⬇️

... and 13 files with indirect coverage changes

grondo · 2023-10-13T15:12:44Z

Ah, so the connector implementation of send_new() is required to destroy the message, it isn't done in the flux_send_new() wrapper? (and would it improve things if it were done that way?)

garlick · 2023-10-13T15:19:10Z

No the send_new() implementation doesn't destroy the message. It takes ownership from the sender and transfers it to the other end of the channel where it's used by the receiver.

grondo · 2023-10-13T15:45:34Z

Ok, thanks! That makes sense now.

grondo

LGTM!

Problem: the interthread connector plugin does not implement the send_new() operation, which can reduce message copying. Add a send_new() operation.

Problem: there is no unit test coverage for flux_send_new(). Add some tests to the handle unit test.

Problem: the broker module internally defines a message credential that is no longer used, now that the interthread connector does that. Drop it.

Problem: there is no man page for flux_send_new(). Add it to flux_send.rst and configure a stub to be produced for flux_send_new(). The previous synopsis function prototypes were not within a literal block and had to be escaped. Fix that while adding the new function.

garlick · 2023-10-13T17:00:01Z

Thanks! I'll set MWP.

garlick · 2023-10-13T17:30:53Z

Got a couple of test failures in the el8,coverage builder - likely not related to this PR?

2023-10-13T16:56:53.4553772Z ##[error]not ok 8 - 0002-exec-with-imp.t: flux exec --with-imp forwards signals
2023-10-13T16:56:53.4558585Z ##[error]not ok 15 - 0004-recovery.t: flux start --recover works
2023-10-13T16:56:53.4561770Z ##[error]ERROR: t9000-system.t - exited with status 1

and nothing else really to go on in the test logs. I'll retry.

libflux: fix code formatting

4f9b363

Problem: some older rpc and response handling code does not conform to modern project norms. Make the following changes - break long compound conditinals to one per line - break long parameter lists to one per line

garlick force-pushed the send_new branch from 78bf6d4 to 6605bec Compare October 13, 2023 00:08

garlick changed the title ~~WIP: libflux: add flux_send_new()~~ libflux: add flux_send_new() Oct 13, 2023

garlick mentioned this pull request Oct 13, 2023

not ok 11 - flux-watch: works with --since #5501

Closed

garlick force-pushed the send_new branch from 6605bec to 492c673 Compare October 13, 2023 13:41

garlick added 2 commits October 13, 2023 06:56

garlick force-pushed the send_new branch from 492c673 to dd1eab0 Compare October 13, 2023 13:57

garlick force-pushed the send_new branch from 4898293 to b9734ba Compare October 13, 2023 16:26

grondo approved these changes Oct 13, 2023

View reviewed changes

garlick added 4 commits October 13, 2023 09:36

interthread: add op->send_new()

8e8a184

Problem: the interthread connector plugin does not implement the send_new() operation, which can reduce message copying. Add a send_new() operation.

testsuite: cover flux_send_new()

d564f17

Problem: there is no unit test coverage for flux_send_new(). Add some tests to the handle unit test.

broker: drop unused module cred

4b57e45

Problem: the broker module internally defines a message credential that is no longer used, now that the interthread connector does that. Drop it.

garlick force-pushed the send_new branch from b9734ba to e35504a Compare October 13, 2023 16:38

garlick added the merge-when-passing label Oct 13, 2023

mergify bot merged commit f6547f4 into flux-framework:master Oct 13, 2023
30 checks passed

garlick deleted the send_new branch October 13, 2023 17:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

libflux: add flux_send_new() #5499

libflux: add flux_send_new() #5499

garlick commented Oct 12, 2023

garlick commented Oct 13, 2023

garlick commented Oct 13, 2023

grondo commented Oct 13, 2023

garlick commented Oct 13, 2023

codecov bot commented Oct 13, 2023

grondo commented Oct 13, 2023

garlick commented Oct 13, 2023 •

edited

Loading

grondo commented Oct 13, 2023

grondo left a comment

garlick commented Oct 13, 2023

garlick commented Oct 13, 2023

libflux: add flux_send_new() #5499

libflux: add flux_send_new() #5499

Conversation

garlick commented Oct 12, 2023

garlick commented Oct 13, 2023

garlick commented Oct 13, 2023

grondo commented Oct 13, 2023

garlick commented Oct 13, 2023

codecov bot commented Oct 13, 2023

Codecov Report

grondo commented Oct 13, 2023

garlick commented Oct 13, 2023 • edited Loading

grondo commented Oct 13, 2023

grondo left a comment

Choose a reason for hiding this comment

garlick commented Oct 13, 2023

garlick commented Oct 13, 2023

garlick commented Oct 13, 2023 •

edited

Loading