Basecalling models #1169

bolbolbalal-del · 2024-12-10T14:24:57Z

Hi,

I have some questions regarding the loss calculation and its underlying concepts.

Is this correct? The LinearCRFEncoder layer models transition probabilities from a state to a specific base. Here the state is a base sequence of length state_len, all possible combinations are modelled and the corresponding values are the multiples of 5 in the outputs feature dimension. The values in between are the transition probabilities from the specific state to the bases A, C, G and T.

What is the general idea behind the loss function? Since the models output and the targets are not aligned, the idea of the CTC loss is applied and the CRF encoder is used to tackle the problem regarding the independence of the predictions?

Can we somehow get deeper insights into how the ctc calculation works in the low-level functions of the koi library (e.g. the forward_backward_implementation)?

How are the stay_scores and move_scores used?
What concepts do the alpha and beta matrices in the LogZ forward function realise?

Is the idea behind this specific CTC-CRF implementation available in any paper or is there any other source that might allow to reconstruct this for research purposes?

Is there any similarity between the loss functions calculations and a Hidden Markov Model? Is this a special case?

Thank you and Best

malton-ont · 2024-12-24T11:07:57Z

@bolbolbalal-del,

You may find this comment thread useful. Questions about model development are generally better placed on the bonito github.

malton-ont closed this as completed Dec 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basecalling models #1169

Basecalling models #1169

bolbolbalal-del commented Dec 10, 2024

malton-ont commented Dec 24, 2024

Basecalling models #1169

Basecalling models #1169

Comments

bolbolbalal-del commented Dec 10, 2024

malton-ont commented Dec 24, 2024