Skip to content

Commit

Permalink
+findHighQualitySNPs
Browse files Browse the repository at this point in the history
  • Loading branch information
lima1 committed Sep 13, 2024
1 parent 476f205 commit 0810ee5
Show file tree
Hide file tree
Showing 5 changed files with 101 additions and 2 deletions.
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@ Package: PureCN
Type: Package
Title: Copy number calling and SNV classification using
targeted short read sequencing
Version: 2.11.1
Date: 2024-05-10
Version: 2.11.2
Date: 2024-09-13
Authors@R: c(person("Markus", "Riester",
role = c("aut", "cre"),
email = "[email protected]",
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ export(filterVcfBasic)
export(filterVcfMuTect)
export(filterVcfMuTect2)
export(findFocal)
export(findHighQualitySNPs)
export(getSexFromCoverage)
export(getSexFromVcf)
export(plotAbs)
Expand Down
2 changes: 2 additions & 0 deletions NEWS
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ NEW FEATURES
Tuning parameter for coverage normalization. Default should in most cases
be a good compromise between removing most normal noise and not
overfitting to pool of normal samples.
o Added findHighQualitySNPs function to extract good SNPs from mapping bias
database.


Changes in version 2.10.0
Expand Down
47 changes: 47 additions & 0 deletions R/calculateMappingBiasVcf.R
Original file line number Diff line number Diff line change
Expand Up @@ -443,3 +443,50 @@ calculateMappingBiasGatk4 <- function(workspace, reference.genome,
gr$clustered[idxa] <- TRUE
return(gr)
}

#' Find High Quality SNPs
#'
#' Function to extract high quality SNPs from the mapping bias database.
#' Useful for generating fingerprinting panels etc.
#'
#'
#' @param mapping.bias.file Generated by \code{\link{calculateMappingBiasVcf}}.
#' @param max.bias Maximum mapping bias
#' @param min.pon Minimum number of normal samples, useful to get reliable
#' mapping bias.
#' @param triallelic By default, ignore positions with multiple alt alleles.
#' @param vcf.file Optional VCF file (for example dbSNP). Needs to be
#' bgzip and tabix processed.
#' @param genome See \code{readVcf}
#' @return A \code{GRanges} object with mapping bias passing filters.
#' If \code{vcf.file} is provided, it will be the variants in the
#' corresponding file overlapping with the passed variants.
#' @author Markus Riester
#' @examples
#'
#' normal.panel.vcf <- system.file("extdata", "normalpanel.vcf.gz",
#' package = "PureCN")
#' bias <- calculateMappingBiasVcf(normal.panel.vcf, genome = "h19")
#'
#' @export findHighQualitySNPs
findHighQualitySNPs <- function(mapping.bias.file, max.bias = 0.2, min.pon = 2,
triallelic = FALSE, vcf.file = NULL, genome) {
if (is(mapping.bias.file, "GRanges")) {
bias <- mapping.bias.file
} else {
bias <- readRDS(mapping.bias.file)
}
bias <- bias[which(abs(bias$bias - 1) <= max.bias & (triallelic | !bias$triallelic) & bias$pon.count >= min.pon), ]
if (is.null(vcf.file)) return(bias)
vcfSeqStyle <- .getSeqlevelsStyle(headerTabix(
TabixFile(vcf.file))$seqnames)

biasRenamedSL <- bias
if (!length(intersect(seqlevelsStyle(bias), vcfSeqStyle))) {
seqlevelsStyle(biasRenamedSL) <- vcfSeqStyle[1]
}
flog.info("Reading VCF...")
vcf <- readVcf(vcf.file, genome = genome, ScanVcfParam(which = reduce(biasRenamedSL)))
ov <- findOverlaps(biasRenamedSL, vcf, type = "equal")
return(vcf[subjectHits(ov)])
}
49 changes: 49 additions & 0 deletions man/findHighQualitySNPs.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit 0810ee5

Please sign in to comment.