Likelihood based distance #20

OJWatson · 2022-02-24T14:23:13Z

Currently, we work out residuals between the theoretical COIs and the observed data.

This was done because initially we were using buckets to work out averages and then comparing those.

With the regression approach though we use the individual data. We could now construct a formal likelihood here, by saying that our observed data (whether method 1 or 2) is described by a Binomial distribution, with number of trials being the coverage and the number of successes being as.integer(wsmaf*coverage) (as.integer here just for rounding errors, wsmaf cam from doing reads/coverage initially) for Method 2.

E.g demo for Method 2:

data2 <- sim_biallelic(6, runif(10000,0,0.5), epsilon = 0.02, coverage = sample(300, 10000, replace = TRUE))

data <- data2$data
data <- data[data$wsmaf > 0.03 & data$wsmaf < 0.97,]
counts <- data$counts
coverage <- data$coverage
plmaf <- data$plmaf

thcois <- theoretical_coi(2:10, plmaf, coi_method = "frequency")

lls <- dbinom(x = counts, size = coverage, prob = as.matrix(thcois[,-ncol(thcois)]), log = TRUE)
lls <- colSums(lls)
which.max(lls)
coi_6 
    5

Just did a quick check to see how different these approaches are:

     
fgh <- function(){
  
data2 <- sim_biallelic(6, runif(1000,0,0.5), epsilon = 0.02, coverage = sample(50, 1000, replace = TRUE))

data <- data2$data
data <- data[data$wsmaf > 0.03 & data$wsmaf < 0.97,]
counts <- data$counts
coverage <- data$coverage
plmaf <- data$plmaf

thcois <- theoretical_coi(2:25, plmaf, coi_method = "frequency")

lls <- dbinom(x = counts, size = coverage, prob = as.matrix(thcois[,-ncol(thcois)]), log = TRUE)
lls <- colSums(lls)
return(c(as.numeric(which.max(lls)+1),compute_coi(data2, "sim", coi_method = "frequency", use_bins = FALSE, seq_error = 0.03)$coi))

}

comp <- replicate(100, fgh(), simplify = TRUE)
rowMeans(comp)
6.42 6.22

So the Binomial likelihood approach is slightly higher for the same samples. Not sue will make a big difference but just flagging here to remind us to ask Bob about this when we go through the paper on his thoughts on this re simple residual

The text was updated successfully, but these errors were encountered:

OJWatson added the feature ✨ feature request or enhancement label Feb 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Likelihood based distance #20

Likelihood based distance #20

OJWatson commented Feb 24, 2022 •

edited

Loading

Likelihood based distance #20

Likelihood based distance #20

Comments

OJWatson commented Feb 24, 2022 • edited Loading

OJWatson commented Feb 24, 2022 •

edited

Loading