-
I'm new to noodles, I like it a lot. It seems like a lot of new examples have been added in the last few months. I find that really helpful. I have a task where I need to iterate over a set of chromosomes a bam's records are mapping to. Is it better to (a) loop through those chromosomes with query() or (b) loop through all reads with lazy_records(), get the ref chrom name from a lazy record, then test a pre-initialized HashSet of my particular chrom names for membership? Are the records returned by query() lazily-loaded anyway? Thanks. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
In general, if the BAM is indexed, I recommend using use std::env;
use noodles::{bam, core::Region};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut args = env::args().skip(1);
let src = args.next().expect("missing src");
let regions: Vec<Region> = args.map(|s| s.parse()).collect::<Result<_, _>>()?;
let mut reader = bam::indexed_reader::Builder::default().build_from_path(src)?;
let header = reader.read_header()?;
let mut n = 0;
for region in regions {
for result in reader.query(&header, ®ion)? {
let _record = result?;
n += 1;
}
}
println!("{n}");
Ok(())
}
If your BAM is not indexed, then sequentially reading each record and filtering by your set of regions is certainly viable. Lazy records are always completely loaded, as its shape is at least validated. They are lazy in the sense that fields are parsed on demand. |
Beta Was this translation helpful? Give feedback.
In general, if the BAM is indexed, I recommend using
Reader::query
. This parses and validates the data in each record and returns it as an ownedsam::alignment::Record
. Here is a basic count example that counts the number of records in a list of given regions.