add disk_mb resource to config for tools that consume a large amount of scratch space at runtime #149

rdmorin · 2020-12-09T16:30:39Z

A few of our modules (the STAR, bam2fastq combination, GRIDSS, battenberg) create fairly large temporary files that eat up a lot of scratch space. To help reduce the risk of a pipeline filling up all the scratch space we should provide users with a way to throttle their runs, thereby not starting jobs if there is not enough scratch space theoretically available. This can be done per rule so Snakemake knows at all times the maximum amount of scratch space the pipeline is theoretically using. Given the growing pressures on scratch space, I think we need to implement this for large projects such as GAMBL.

lkhilton · 2020-12-09T19:15:40Z

Is this different from setting e.g. bam=1 in the Snakefile and --resources bam=50 in the Snakemake command?

rdmorin · 2020-12-09T19:46:20Z

I think so. The idea here is the user is no longer required/expected to know what the data footprint of one bam-equivalent would be for each pipeline. Instead, the user would specify their resources in available disk_mb. For example, if I knew I had only 2TB to work with in scratch I could set my resource limit to be 1.5TB at most. This would prevent any new jobs from launching if they could exceed that. Once jobs complete, assuming they clean up after themselves, then new jobs would be submitted that could consume the scratch space that had been freed up.

lkhilton · 2020-12-09T19:55:01Z

That makes sense. I think the only issue then is that the disk_mb value can vary pretty widely. For GRIDSS on fresh frozen the temp footprint is like 20 GB, on FFPE it's sometimes 150-200 GB. But picking an intermediate/conservative value and putting some comments in the default config to guide users on when/how to change it would address that.

rdmorin · 2020-12-09T19:58:17Z

I agree with that being an issue. I wonder if we could configure a few values of this that would be set at runtime depending on the sample type? The same idea would apply here for running battenberg on a genome vs an exome.

lkhilton · 2020-12-09T20:27:42Z

Ah yes, a switch_on_wildcard would be useful here.

rdmorin · 2020-12-09T20:52:22Z

Might it be more useful if we had FF/FFPE status as a wildcard, though? Or is there an alternative we can use that doesn't require a wildcard?

lkhilton · 2020-12-09T21:00:43Z

Using switch_on_wildcard to change the value based on FF/FFPE would require it to be a wildcard. It could also be a column in the samples table, but it still requires that to be in the samples table. I was thinking we'd just use switch_on_wildcard for changing the value based on seq_type, and provide guidance to the user on increasing those values for FFPE samples.

Need to also think about how this plays with the resource unpacking function we discussed with Bruno.

Kdreval · 2024-07-26T17:05:42Z

This is probably addressed by now since we have the gridss module working? Unsure if this can be closed or there is something that has to be addressed?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add disk_mb resource to config for tools that consume a large amount of scratch space at runtime #149

add disk_mb resource to config for tools that consume a large amount of scratch space at runtime #149

rdmorin commented Dec 9, 2020

lkhilton commented Dec 9, 2020 •

edited

Loading

rdmorin commented Dec 9, 2020

lkhilton commented Dec 9, 2020

rdmorin commented Dec 9, 2020

lkhilton commented Dec 9, 2020

rdmorin commented Dec 9, 2020

lkhilton commented Dec 9, 2020

Kdreval commented Jul 26, 2024

add disk_mb resource to config for tools that consume a large amount of scratch space at runtime #149

add disk_mb resource to config for tools that consume a large amount of scratch space at runtime #149

Comments

rdmorin commented Dec 9, 2020

lkhilton commented Dec 9, 2020 • edited Loading

rdmorin commented Dec 9, 2020

lkhilton commented Dec 9, 2020

rdmorin commented Dec 9, 2020

lkhilton commented Dec 9, 2020

rdmorin commented Dec 9, 2020

lkhilton commented Dec 9, 2020

Kdreval commented Jul 26, 2024

lkhilton commented Dec 9, 2020 •

edited

Loading