Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Render tex (text and math) in the app #32

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 1 addition & 3 deletions notes/notes.tex
Original file line number Diff line number Diff line change
Expand Up @@ -56,9 +56,7 @@
The benefit of this Gaussian model is that this can be solved exactly.
The minimum is obtained by setting the derivative with respect to $p_j$ to 0.
\begin{equation}
\partial_j logL(\{p_i\} | \{n_i\}, \{k_i\}) = n_j^2\frac{(k_j - p_j n_j)}{(k_j+p_c)(n_j-k_j+p_c)}
% - \lambda (p_j - p_{j-1}) + \lambda (p_{j+1} - p_{j})
- \lambda (2p_j - p_{j-1}-p_{j+1}) = 0
\partial_j logL(\{p_i\} | \{n_i\}, \{k_i\}) = n_j^2\frac{(k_j - p_j n_j)}{(k_j+p_c)(n_j-k_j+p_c)} - \lambda (2p_j - p_{j-1}-p_{j+1}) = 0
\end{equation}
For $j=0$ the last term is just $\lambda (p_1 - p_0)$, for $p_m$, the $-\lambda (p_m - p_{m-1})$
This is a very sparse linear system of equations that owing to the stiffness constraint should non-degenerate.
Expand Down
3 changes: 2 additions & 1 deletion web/config/next/next.config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ import remarkMath from 'remark-math'
import remarkToc from 'remark-toc'
import remarkSlug from 'remark-slug'
import remarkImages from 'remark-images'
import rehypeKatex from 'rehype-katex'

import { findModuleRoot } from '../../lib/findModuleRoot'
import { getGitBranch } from '../../lib/getGitBranch'
Expand Down Expand Up @@ -134,7 +135,7 @@ const withMDX = getWithMDX({
// },
// ],
],
rehypePlugins: [],
rehypePlugins: [rehypeKatex],
},
})

Expand Down
2 changes: 2 additions & 0 deletions web/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@
"intersection-observer": "0.12.2",
"is-absolute-url": "4.0.1",
"iso-3166-1-alpha-2": "1.0.1",
"katex": "0.16.8",
"lodash-es": "4.17.21",
"luxon": "3.2.1",
"marked": "4.2.12",
Expand Down Expand Up @@ -116,6 +117,7 @@
"recoil-persist": "4.2.0",
"reflect-metadata": "0.1.13",
"regenerator-runtime": "0.13.11",
"rehype-katex": "7.0.0",
"resize-observer-polyfill": "1.5.1",
"route-parser": "0.0.5",
"serialize-javascript": "6.0.1",
Expand Down
3 changes: 2 additions & 1 deletion web/src/components/Home/HomePage.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import { MdxContent } from 'src/i18n/getMdxContent'
import { getDataRootUrl } from 'src/io/getDataRootUrl'
import { PageContainerNarrow } from 'src/components/Layout/PageContainer'
import { PathogenCard } from 'src/components/Home/PathogenCard'
import 'katex/dist/katex.min.css'

export interface IndexJson {
pathogens: Pathogen[]
Expand All @@ -31,7 +32,7 @@ export function HomePage() {
</Row>
<Row noGutters>
<Col>
<MdxContent filepath="Home.md" />
<MdxContent filepath="Notes.md" />
</Col>
</Row>
</PageContainerNarrow>
Expand Down
71 changes: 71 additions & 0 deletions web/src/content/en/Notes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
---
author:
- Richard A. Neher
title: Estimating frequencies
---

Estimating frequency trajectories from count data is a recurring problem
in many of our analyses. We previously fitted to a sampling likelihood
with a stiffness prior of the type
$$L(\{p_i\} | \{n_i\}, \{k_i\}) \sim \prod_i p_i^{k_i} (1-p_i)^{n_i-k_i} \times \prod_i e^{ - \lambda (p_i - p_{i-1})^2/2}$$
where the stiffness $\lambda$ might want to depend on $p_i$ itself. If
regularization is diffusive, we expect the parameter $\lambda$ the scale
as $\lambda = 1/2D(t_i-t_{i-1})$. This non-linear likelihood is
difficult to optimize. Since we don't really think that the data we have
is well described by iid sampling, we might just as well approximate the
binomial sampling LH by a gaussian with mean $\hat{p}_i = k_i/n_i$ and
variance $$\sigma^2 = \frac{k_i(n_i-k_i)}{n_i^3}$$ Not that this
variance diverges when $n_i=0$, corresponding to the case when there is
no information. The log-likelihood then looks like this
$$logL(\{p_i\} | \{n_i\}, \{k_i\}) = C + \sum_i n_i\frac{(k_i - p_i n_i)^2}{2k_i(n_i-k_i)} + \sum_i \lambda (p_i - p_{i-1})^2/2$$
We still need to guard this against cases where $k_i=0$ or $k_i=n_i$, or
more generally cases where the binomial is not well approximated by a
Gaussian. This can be achieved by simply inflating the variance -- the
variance is anyway underestimated by the idd approximation. Here, we add
pseudo-counts $p_c$ to both terms of the variance.
$$logL(\{p_i\} | \{n_i\}, \{k_i\}) = C + \sum_i n_i\frac{(k_i - p_i n_i)^2}{2(k_i+p_c)(n_i-k_i+p_c)} + \sum_i \lambda (p_i - p_{i-1})^2/2$$
The next question is how the stiffness $\lambda$ should scale with the
number of samples and the time discretization. If our null is that the
true frequencies change diffusively, $\lambda \sim 1/D\Delta t$. The
absolute value of the stiffness should be related to how much we expect
frequencies to change from one month to the next. For influenza, this is
typically on the order of $\sqrt{Dt} \sim 0.1$, which suggests that $Dt$
should be on the order of 100. A month is 30 days, so a value around
1000 seems sensible.

The benefit of this Gaussian model is that this can be solved exactly.
The minimum is obtained by setting the derivative with respect to $p_j$
to 0.
$$\partial_j logL(\{p_i\} | \{n_i\}, \{k_i\}) = n_j^2\frac{(k_j - p_j n_j)}{(k_j+p_c)(n_j-k_j+p_c)} - \lambda (2p_j - p_{j-1}-p_{j+1}) = 0$$
For $j=0$ the last term is just $\lambda (p_1 - p_0)$, for $p_m$, the
$-\lambda (p_m - p_{m-1})$ This is a very sparse linear system of
equations that owing to the stiffness constraint should non-degenerate.
Rearranging this to separate constant and $p_i$ dependent parts yields
$$\frac{n_j^3}{(k_j+p_c)(n_j-k_j+p_c)}p_j + \lambda (2p_j - p_{j-1} - p_{j+1}) = \frac{n_j^2 k_j}{(k_j+p_c)(n_j-k_j+p_c)}$$

The second derivative of the objective function with respect to
frequencies would have the diagonal elements
$$\frac{n_i^3}{(k_i+p_c)(n_i-k_i+p_c)} + 2\lambda$$ and off-diagonal
elements $(i, i+1)$ of $-\lambda$. The confidence intervals would then
be the square root of the diagonal of the inverse of this Hessian.

Hierarchial frequencies {#hierarchial-frequencies .unnumbered}
=======================

Different countries in a region are pretty well mixed but deviate from
the region-average frequency. It thus makes sense to tie them together.
If we express the frequency in a country as
$p_c(t_i) = p_r(t_i) + \Delta p_c(t_i)$ we can regularize the
$\Delta p_c$ to with a quadratic penalty. The Gaussian logLH is thus
$$logL(\{p_i\} | \{n_i\}, \{k_i\}) = C + \sum_i \sum_c n^c_i\frac{(k^c_i - (p_i + \Delta p^c) n^c_i)^2}{2(k^c_i+p_c)(n^c_i-k^c_i+p_c)} + \sum_i \lambda (p_i - p_{i-1})^2/2 + \frac{1}{2}\sum_i\sum_c \left(\lambda (\Delta p^c_i - \Delta p^c_{i-1})^2 +\mu (\Delta p^c_{i})^2\right)$$
The derivative with respect to the $p_i$ and $\Delta p_i^c$ now look
somewhat different. The derivative with respect to $p_j$ is
$$-\sum_c (n^c_j)^2\frac{(k^c_j - (p_j + \Delta p^c_j) n^c_j)}{(k^c_j+p_c)(n^c_j-k^c_j+p_c)} + \lambda (2p_j - p_{j-1}- p_{j+1}) = 0$$
and can be rearranged to
$$\sum_c \frac{(n^c_j)^3}{(k^c_j+p_c)(n^c_j-k^c_j+p_c)} (p_j + \Delta p^c) + \lambda (2p_j - p_{j-1}- p_{j+1}) = \sum_c \frac{(n^c_j)^2 k^c_j}{(k^c_j+p_c)(n^c_j-k^c_j+p_c)}$$
This will generate a number of off-diagonal terms, but it should still
be a pretty sparse matrix.

The derivative with respect to $\Delta p^d_j$ is
$$-(n^d_j)^2\frac{(k^d_j - (p_j + \Delta p^d_j) n^d_j)}{(k^d_j+p_c)(n^d_j-k^d_j+p_c)} + \lambda (2\Delta p^d_j - \Delta p^d_{j-1}- \Delta p^d_{j+1}) + \mu \Delta p_j^d = 0$$
$$\frac{(n^d_j)^3}{(k^d_j+p_c)(n^d_j-k^d_j+p_c)}(p_j + \Delta p^d_j) + \lambda (2\Delta p^d_j - \Delta p^d_{j-1}- \Delta p^d_{j+1}) + \mu \Delta p_j^d = \frac{k^d_j(n^d_j)^2}{(k^d_j+p_c)(n^d_j-k^d_j+p_c)}$$
1 change: 1 addition & 0 deletions web/src/pages/_app.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ import { RegionPage } from 'src/components/Regions/RegionPage'
import { VariantPage } from 'src/components/Variants/VariantPage'
import { localeAtom } from 'src/state/locale.state'
import 'src/styles/global.scss'
import 'katex/dist/katex.css'

const NotFoundPage = dynamic(() => import('src/pages/404'))

Expand Down
Loading