write `bbi_config.json` even on failure? #318

kyleam · 2024-05-10T19:11:24Z

[ related to metrumresearchgroup/bbr#685, +cc @seth127 @barrettk ]

For the various errors where RecordConcurrentError is invoked (including a non-zero exit of the nmfe* subprocess)

Lines 185 to 191 in 835953e

    
           cerr := executeNonMemJob(executeLocalJob, l.Nonmem) 
        
           if cerr.Error != nil { 
        
           	p := &l 
        
           	p.BuildExecutionEnvironment(false, cerr.Error) 
        
           	RecordConcurrentError(p.Nonmem.Model, cerr.Notes, cerr.Error, channels, p.Cancel, p) 
        
           }

... true is sent the cancel channel

bbi/cmd/root.go

Lines 177 to 178 in 835953e

    
           func RecordConcurrentError(model string, notes string, err error, channels *turnstile.ChannelMap, cancel chan bool, executor PostWorkExecutor) { 
        
           	cancel <- true

... which stops the execution before the turnstile cleanup phase.

As a result, bbi_config.json isn't written. As discussed at metrumresearchgroup/bbr#685, from bbr's perspective it would be useful to have bbi_config.json consistently written as an indication that a model run has completed (even if unsuccessfully).

When considering this change, we should go through local's Cleanup and decide which parts should run in case of a failure.

Here are three related issues mentioned in the bbr thread. (At this point I'll track them here, but if they don't end up getting handled by the same series, they should be split off into dedicated issues.)

if bbi aborts early due to the data file not existing, it doesn't write the config file or create the output directory. From a consistency perspective, it might be good if bbi_config.json was created. I'm not sure where to draw the line between a pure bbi early non-zero exit (e.g., unknown argument given) and a model run error. But if the data file doesn't exist, that's something nmfe would fail on, and bbi is just catching it early. So it seems reasonable for it to behave consistently with other nmfe failures.

There are perhaps other cases that behave similarly to the non-existent data path.
if --overwrite is passed but bbi aborts to a non-existent data path (again, there may be other cases), the directory is not removed. That's confusing from bbr's perspective because it looks at bbi_config.json and will think it's from its latest model submission.
bbr is interested in knowing whether a model is currently running. To help, bbi could do the combination of 1) adding a file before launching nmfe* and removing it on cleanup and 2) ensuring cleanup still fires when the nmfe* subprocess exits with a non-zero status

The text was updated successfully, but these errors were encountered:

kyleam mentioned this issue May 31, 2024

bbi sometimes errors when submitting with --overwrite while model is running #322

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

write `bbi_config.json` even on failure? #318

write `bbi_config.json` even on failure? #318

kyleam commented May 10, 2024

write bbi_config.json even on failure? #318

write bbi_config.json even on failure? #318

Comments

kyleam commented May 10, 2024

write `bbi_config.json` even on failure? #318

write `bbi_config.json` even on failure? #318