Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplex read id in multiple duplex pairs #389

Open
davidebolo1993 opened this issue Oct 2, 2023 · 23 comments
Open

Simplex read id in multiple duplex pairs #389

davidebolo1993 opened this issue Oct 2, 2023 · 23 comments

Comments

@davidebolo1993
Copy link

Hi guys,

we are investigating our first duplex run. I've read useful discussions in #316 and #327 but couldn't find an obvious explanation to what we see.
From the docs and issues we see that a simplex read (let's call this r) that are also part of a duplex pair (let's call it d) is tagged dx:i:-1 and the corresponding duplex pair (d, indeed) is in the form r,t (r and t are the read names) and is tagged as dx:i:1.
An example is this read here (0ee988dd-2227-47f7-ab19-99acfc66d686), with the corresponding tags.

d69f94b2-51d2-4c61-8c3b-7104c6cccc2a;0ee988dd-2227-47f7-ab19-99acfc66d686	1
0ee988dd-2227-47f7-ab19-99acfc66d686	-1

So far so good, and indeed most of the simplex reads having a duplex pair follow this scheme.

There are, however, simplex reads (dx:i:-1) that have multiple duplex pairs, so that the read r appears in a first duplex r,t and in a second duplex q,r.
An example is this read here(d69f94b2-51d2-4c61-8c3b-7104c6cccc2a):

d69f94b2-51d2-4c61-8c3b-7104c6cccc2a;0ee988dd-2227-47f7-ab19-99acfc66d686	1
ed2df147-bb5c-4215-98e6-69b7ed90b01c;d69f94b2-51d2-4c61-8c3b-7104c6cccc2a	1
d69f94b2-51d2-4c61-8c3b-7104c6cccc2a	-1

What is happening here ? Are the other 2 ids basically referring to the same template read d69f94b2-51d2-4c61-8c3b-7104c6cccc2a but are partial duplex of 2 different part of it ? Something like this (#327 (comment)) but at different ends? I'm just guessing as I couldn't find anything related to this - sorry if I missed it.

Thanks,

Davide

@tijyojwad
Copy link
Collaborator

Hi @davidebolo1993 , thanks for flagging this!

Most likely this is happening because our pairing heuristic is mis-identifying one of the pairs. The pairing algorithm takes into account multiple factors (read lengths of template and complement, their overlap, the position of the overlap, etc.). We're constantly improving those heuristics (we'll be releasing a newer version soon), but some false positive pairs might get through.

In most cases these false positive pairs eventually gets discarded due to low q-score. I'm wondering if that's also the case for you. What mean q-score do you see for these 2 duplex reads that share a common simplex ancestor? The qs tag in the bam record should tell you the mean Q score. The false pair should have a pretty low number for that.

@davidebolo1993
Copy link
Author

Hi @tijyojwad,

Here you have it:

d69f94b2-51d2-4c61-8c3b-7104c6cccc2a;0ee988dd-2227-47f7-ab19-99acfc66d686	1	17
ed2df147-bb5c-4215-98e6-69b7ed90b01c;d69f94b2-51d2-4c61-8c3b-7104c6cccc2a	1	33
d69f94b2-51d2-4c61-8c3b-7104c6cccc2a	-1	23

There is definitely one in higher quality. Do you have any threshold suggested for filtering these results for the time being ? Maybe when a simplex has higher quality than one of the duplex (if multiple ones) ? Just asking, because I don't think a fixed threshold will work.

Thanks,

Davide

@tijyojwad
Copy link
Collaborator

Oh this is interesting - I would have expected one of them to be quite low (< 10), so setting a --min-qscore threshold during basecalling would get rid of those. It's clear that the second one is the right duplex pair, but I agree we can't obviously distinguish based on qscore here.

Are there quite a few of these? I would expect these to be quite rare, so for now a heuristic to just keep the pair with the higher qscore should be sufficient. Once we release the new version (within a week) it'd be great if you can rerun and see if the problem persists.

@davidebolo1993
Copy link
Author

We will have a look at this with the most recent version and give a feedback!

Thanks,

Davide

@tijyojwad
Copy link
Collaborator

Any update on this @davidebolo1993 ?

@HalfPhoton
Copy link
Collaborator

Closing this as there has been no updates in a month - please re-open if you're issue is still ongoing

@minefield47
Copy link

minefield47 commented Feb 12, 2024

@HalfPhoton @tijyojwad I can give an update as I am currently having the same issue!

Here are my current "duplicate" reads from my two libraries in which a read is being used as both a template and a complement. I additionally added on the parent simplex read length and quality. Let me know if you need anything else!
image

image

Previously, I had a false positive rate (about ~10 for the second image/library). However, after seeing #617, I swapped from utilizing the Turing GPUs available to me, I have since re-basecalled my libraries using the [email protected] & [email protected] with Dorado V0.5.1+a7fb3e3 on A100 GPUs.

Thank you,

@tijyojwad
Copy link
Collaborator

Hi @minefield47 thank you for reporting the issue.

The underlying cause is likely the same one highlighted in #389 (comment) . I think a reasonable heuristic is to pick the one with higher duplex Q score as the "correct" pair and discard the other one. We've considered applying some heuristics to pick the "right" one in case of duplicates, but due to its rare occurrence we haven't prioritized it yet.

I had a false positive rate (about ~10 for the second image/library)

Is this 10 out of the whole dataset? What percentage is that?

after seeing #617, I swapped from utilizing the Turing GPUs available to me, I have since re-basecalled my libraries using the

We're investigating the highlighted discrepancy, but in general I would expect very minor differences in the output. Are you seeing significant changes after moving to A100? What were you running on before?

@minefield47
Copy link

minefield47 commented Feb 12, 2024

@tijyojwad

I have read the pairing code previously and completely agree that is what is going on. I will get a script going to filter out the lower scored duplicates. Depending on the read length of the false positive, I might look into finding the simplex parents for utilization during our de novo assembly, but I am not sure it will be worth as the read is already being mapped to a duplex. The next few weeks are going to be trying different assemblers such as NECAT, Canu, and Flye. For assembly, is polishing still a thing or is that now a thing of the past? I have been seeing a mix of papers that fall on one side or the other when assembling.

That is 10 out of the whole dataset of ~2M reads, so still extremely low. Sorry for the confusion.
I don't have the duplicates for my whole dataset but here are the duplicates from a single library done on "random" GPUs (see below).
image

I previously was utilizing an ensemble of RTX2080s, GTX1080s, A10s, A30s, and A100s. My workflow is to subset the pod5 files by channel and run Dorado on a HPC as a LSF job array where each channel is assigned a part of an available GPU in the above list. So it is difficult to determine where the differences are truly coming from. I stopped basecalling my second library half way through to restart on only the A100 so I can only compare a single library.

Here is the total differences (the simplex df includes duplex parents):
Random GPUs:
image
A100 only:
image
Here is the N50 difference:
image
Here is the total number of bases called:
image

If I can get you any more information let me know!

@minefield47
Copy link

Additionally, I am currently running the dorado duplexing and then running the adapter/primer trimming function from these reads to create a trimmed bam file. Something I just noticed is that the file size of my individual channels do not seem to be changing. Does the duplex function automatically do adapter/primer trimming or is this not expected behavior?

@tijyojwad
Copy link
Collaborator

Hi @minefield47 - this is super useful info, thank you so much!

Depending on the read length of the false positive, I might look into finding the simplex parents for utilization during our de novo assembly, but I am not sure it will be worth as the read is already being mapped to a duplex.

Another option might be to just remove the lower quality duplex read that has a duplicated parent and keep the simplex reads as they are. this way you reduce the chance of using an incorrect duplex pair but still get the support from reads covering that region.

For assembly, is polishing still a thing or is that now a thing of the past? I have been seeing a mix of papers that fall on one side or the other when assembling.

I think it's somewhat species and pipeline dependent. With duplex reads we're already correcting some of the errors so there will be fewer errors to polish later.

@tijyojwad
Copy link
Collaborator

Also, @minefield47 would you be able to share what the "random GPU" in your run was? And just to confirm, both runs used the exact same pod5/model the only difference was the GPU used right?

@minefield47
Copy link

minefield47 commented Feb 12, 2024

Also, @minefield47 would you be able to share what the "random GPU" in your run was?

It wasn't a single GPU model. I used an array of 512 jobs with IMB's LSF where each pod5 channel was assigned to either an RTX2080, GTX1080, A10, A30, or A100 depending upon what was available on the cluster at that moment. So channel 1 might have been called on an A100, but channel 2 might have been called on a RTX2080, etc. After sequencing completed and everything was called, I removed my stderr/stdout so I no longer know what channels were called on which GPU.

And just to confirm, both runs used the exact same pod5/model the only difference was the GPU used right?

Yes. Everything else was completely identical, I have the model downloaded to the cluster and basecall using it to prevent waiting for the model to download on every channel.

@minefield47
Copy link

minefield47 commented Feb 12, 2024

Another option might be to just remove the lower quality duplex read that has a duplicated parent and keep the simplex reads as they are. this way you reduce the chance of using an incorrect duplex pair but still get the support from reads covering that region.

I am going to look into the two duplex reads/their parents and see where the differences are. An idea my PI just had was to keep these "false positives" in during assembly and see where they get mapped to. They might be from areas that are repetitive or the leading/trailing end difference between the two simplex reads might be high A/T regions that broke apart at different locations during library prep and we are seeing true differences. We will find out!

I think it's somewhat species and pipeline dependent. With duplex reads we're already correcting some of the errors so there will be fewer errors to polish later.

Sounds great, we have the compute power and are dealing with a genome that is expected to have a lot of repetitive elements. As such, we are going to run a bunch of different versions and get some comparisons.

I asked this a bit ago above but wanted to put it here so it does not get lost:

Additionally, I am currently running the dorado duplexing and then running the adapter/primer trimming function from these reads to create a trimmed bam file. Something I just noticed is that the file size of my individual channels do not seem to be changing. Does the duplex function automatically do adapter/primer trimming or is this not expected behavior?

Thank you for the quick replies! You and the rest of the team have been amazing help these past few weeks getting everything up and running.

@tijyojwad
Copy link
Collaborator

the file size of my individual channels do not seem to be changing. Does the duplex function automatically do adapter/primer trimming or is this not expected behavior?

Are you working only with the duplex reads? In that I can the adapters will be automatically trimmed off as a consequence of the overlaps created during pairing. But if you have simplex reads in your dataset as well then I'd expect some trimming after running dorado trim

@minefield47
Copy link

When I asked this, I was going off memory of the previous libraries called on the ensemble of GPUs, which I have since removed off the cluster, so cannot examine it anymore.
I just checked and trimming does occur. I went into my new libraries and for some reason my trimmed bams are human readable while my untrimmed are in binary? I never bothered to even try opening the bams before so not sure if that is intended behavior?

Anyways, converted the untrimmed to fastq and just started randomly searching for trimmed reads in the untrimmed reads and the random 10 simplex only reads all got trimmed.

Sorry for the confusion!

@tijyojwad
Copy link
Collaborator

No problem!

trimmed bams are human readable while my untrimmed are in binary? I never bothered to even try opening the bams before so not sure if that is intended behavior?

the default output mode is BAMs, so both should be binary outputs.

@minefield47
Copy link

minefield47 commented Feb 14, 2024

Well...that is potentially a problem?

All of the trimmed files, I went through both libraries and randomly subset the trimmed "bams" to see are human-readable...they look like this:
image

Here is the deconstructed (i.e. I removed the variables for library path/channel_ID to make it easier to interpret) code:

`
dorado duplex
<path to model>
<path to pod5_by_channel>/channel#.pod5
> <path to untrimmed bam>/channel#_untrimmed.bam

dorado trim <path to untrimmed bam>/channel#_untrimmed.bam
> <path to trimmed directory>/channel#_trimmed.bam
`

I have no error codes and they successfully output to fastq files that work for assembly (yay!)...

@davidebolo1993
Copy link
Author

Hi @tijyojwad,

I finally found time to get back to you on this.

Our facility re-basecalled the experiment using dorado v.0.5.2 this time.
I’m adding some reports below.

The .ubam from dorado duplex has 10288340 entries. Among these, we have:


  • 8680482 reads tagged with dx:i:0 - so, simplex only
  • 546841 reads tagged with dx:i:1 - so, duplex
  • 1061017 reads tagged with dx:i:-1 - so simplex having a duplex offspring

In the perfect scenario the number of simplex having a duplex offspring would be double the number of duplex (so, I expected this to be 1093682). The reason dx:i:-1 tagged reads are less than expected by just doubling the dx:i:0 reads is because 31825 simplex with duplex offspring appears in more than a duplex pair. In particular:

  • 31102 dx:i:-1 appear in 2 duplex pairs
  • 616 dx:i:-1 appear in 3 duplex pairs
  • 104 dx:i:-1 appear in 4 duplex pairs
  • 1 dx:i:-1 appear in 6 duplex
  • 2 dx:i:-1 appear in 8 duplex

There are a couple of things that look weird to me here.

fa789c2c-50d6-478e-aa11-d214495148f7  0 19
fa789c2c-50d6-478e-aa11-d214495148f7  0 19
fa789c2c-50d6-478e-aa11-d214495148f7  0 19
fa789c2c-50d6-478e-aa11-d214495148f7 -1 19
fa789c2c-50d6-478e-aa11-d214495148f7;cd0ab1a3-86cd-4663-a552-d71a5c8eb5ce  1 22
fa789c2c-50d6-478e-aa11-d214495148f7;cd0ab1a3-86cd-4663-a552-d71a5c8eb5ce  1 22
fa789c2c-50d6-478e-aa11-d214495148f7;cd0ab1a3-86cd-4663-a552-d71a5c8eb5ce  1 22
fa789c2c-50d6-478e-aa11-d214495148f7;cd0ab1a3-86cd-4663-a552-d71a5c8eb5ce  1 22

These are read_id dx_tag and qs_tag for one of the read appearing in 4 duplex pairs - the duplex pair is always the same but appears 4 time in the alignment (is there a reason for this? Do you have a sense if this somehow affects downstream applications?).

Also, the same read here is tagged as simplex with no duplex offspring, simplex with duplex offspring and appears as duplex as well. I thought that 0 and -1 tags were mutually exclusive but it doesn’t seem the case from this example.
Intersecting the unique read names for reads with dx:i:-1 and dx:i:0 tags resulted in 17529 reads that are flagged both dx:i:0 and dxi:-1. 
Attached an example of read having all the 3 tags.

000a7f54-f079-4909-b056-324ac877d212	4	*	0	0	*	*	0	0	GTTATGTTGCATCTTACTTCGTTCAGTTACGTATTGCTATTGTGTAAGAACTTTGGCTTTTACTCTGAGTAAAATGGGGAGCCATTGGAGTGTTTGTGAACAGAGGAGTGACATGATCTCATCTTTTAAAAGGATTACTCTTACTTCTGTGTTGATAATAGACTATAATAGTGAAAACAGAAAGACCGATTAGTTACAATATTGCTATACTACAGAAAAGAGATAGTGATGACTTGGAAAGTGGTGGAGGAAGTAAAACGTGATCTGAATCTGGATGTAACGTGAAGATAGAGCAGGCAAGGTTTGTTGACAGATTAGATTTGCTATTTGAGAGAGAGCATAACTACAAAGTTTTTGGCTTGAGCCCCTGAAGAGATCGATTTGTTATTCTTTTATTAGGATGGGCAAGACTGTGGGTAGAGTAGGTTTTTGGAGGAAGATCAGGAGTTTAGTTTGGGGCATGTTAAGTTTGGCATGTCTAGTAGACACCCCATTGAAACTACTTTAGGTGTACTTTGTTGAAGTACATCCTTAGAATTTCTTTTGGTGAGAGTCTCTGGAAGGCAGACTTAATCTTTGTATATTTCAGATTGTCTTTTTCTTACCTATGCTTTTGAATGTACTTTAACGGGGTATACAATTCTGGTGGACAGTTGTTTTCCTTCACTGCTTTTGTTATTGCCAGTGAGAAACTTATGTTGGTATAATTGATGTTCCTTTGTAGATCTCCAGTTTTTTTTTCCTTTAAATCTCCTGTGGCTTGTAAGTTTTTTTATTAATCACAATTTTGCTATGCTTTTTCCAGGTATGGAACTATCGTTATGTATATACACACACATACTTTATGGTTTTTTCATTCTGTATACTGTTTTGGGTAAATTTCCCAGAAAACTATCTTCTAAGTCACGAATTCGTTAGAATTCATCTTATCTGTTTTATATGTGTGTGTGTATATATAGTTTATATATATTATATATAAAATATTATATATTATATATTATATATAAAATATTATATATTATATTATATATTATATATAAAATATTATATATTACATATTATATATTATATATAATATATTATATATTATATATTATATATTATATATAATATATTATATATTATATATTATATATAATATATTATATATTATATATTATATATAATATATAATATATTATATATTATATATAATATATTATATATAATATATAATATATTATATATAATATATAATATATTATATATTATATATAAAATATATTATATATAAAATATATTATATATAATATATATATTTATATATATTATATATAAAATATATTATATATAAAATATATATATTTATATATATTATATATAAAATATATAATATATATAATATTATATATTATATATAAAATGTTATATATTATATATAAAATATTATATATTATATATAAAATATACGTAAAATATATATTATACATAAAATATATATTATACATAAAATATATATTATACATATTATACATAAAATATATATTGTACATATTATTATATATAAAATATATATTGTACATATTATTATATATAAAATATATATTGTACATATTATTATATATAAAATATATATTATACATATTATATATAAAATATTATATATTATACATATTATATATAATATATATTTATATATATATTATATATATTATATATAATATATAATATATTATATATTATTATATACAATATATATATTATATATAAAATATACATATATAAAATATATAAAATATATATAAAATATACATATATAAAATATACATATATGTTATGTATATTATTTATATTGTGTATAAAATATACATATTATGTATATTATATATTATATATAAAATATACGTATATATTATGTATATTATATATATTATATATTATATATATATTTTTTAAGACAGGGTCTCACTCTGTCACCCAGGCTGGAGTGCCTTGGCATGATCTCAGCTCACTGCAACCTCTGCCTCTTGGGCTCAAGCGATCCTCCCACCTCAGCCTCCCAAGTAGCTGAGACTACAGGCAAGTGCCACCATGCCCAGCTAATTTTTGTATTTTTTTGTAGAGATAGTGTTTCACCATGTTGCCCAGGCTGGTCTCAAACTCCAGAGCTCACGTGATCTGCCTGCCTCGTCCTTCCAAAGTGCTGGGATTACAGGCGTGAGCCACCGTGCTTGGCTGAAAGTACACTTTTATTTCTAGTCTTTTTTATTTTTATACATGCCTGTTCTTATTTCACTACGGCTTGTTTTTCATGATTTCTTGTTCTTTATTTTTTTTTTTTTTTAAGGGACAGTGTCTCACTGTGTCATCCACACTGGAGTCTAGTGGTGTCATCATAACTCACTGTAGCCCTGAATTCCTGGGCTCAAGTGATCTTCCTGCTTTAGCCTCCTGAGTTGCTGAGACTACAGATGTGAGCCACCATGCCCAGTTAACTCTTCTTAAATAGAAGTTATTCCCTTATTTATCACTTCAGTATCCTAAACATAGTTATTTTAAGGTTTTTGTCAGACTGCTCCATAAAATTAGTTTCATATAGAGTGAATGTGTGTTATTACTGATTTTTTTAGGTTGTCTTTCTTAATATAAAATTTATTTCTATATTTTGAATTTTGGCTCACTGGCTATGATGGGAGCCTTTTGTTTACTGATCTCTTTCTAGGCTCTTTCCTCTCTATCAGAGGAACTAGATTAGGTAGTGGTATTTTGGGTGTCCCCCTTCACAGTGAAATTGGGGATATCATAGATCTAGTCACTGAGTTAGTAGTCAGCCTGATTCAGAGGCCCCCTGTGCCCTTGGCTTTTCCTACTGCTACAGCCTCAGGCAGCTCTGGCAGTAGCCCTTTTTCAGGTTCTTCTCAAAAGAGATGAGGCTGCTGAAATTCTTTGGCTGCCACTACTGCTTCTAAATCTGTAACTCTTGGGTCTCTGGTATCAGCACACTCACTACTTTGCATTTCTGTTCATTTTTACCCAGATGATTATCTTGCTTTTGGACTCAGCTGGTTTTGGGGAGCAATACGTTG	')))((+)&%&&&'''').--.//2766555569DDDJIEDB@=>>DDGJJHRSMSLNSLSSSSORNLKNSRSKPLJLSSQIIKIIPNPLLKSSSSLJNLOPQSINMKSSSSSSKMSRSHPJFIBDEABJFAJJIGFDFDHFEELLQKOMQHIHGGSJSKSPSKKSPSSIIKFKJLA<JGJIJSMSMSIOLPHMSMKLKSJSLJRPSOONJLSOKLMPNMOSSQLNOMSMMMKIKHHNMPNO;998999CCCSIQMMOLOSQHSMIMQKKNSKSSRMSONJJLLOLLMJMLKMMSSSSRKMKSNSRSQSSQNLKSSMMJSINLOSNSNI?=92;7GFLSKKNSOSNOSSSKSSSSSPJHNPOSSLKIIE??@IMNJMJSKKRLOSSMLSLSLILJJEBBFSLOIOSJSJSRMCBCQKKJSJNJJSKHHEPIJIMJJSMSSSLIGGJKPSOSPSHSMSRMISOOQPKKISMSQSSOSSPLMJILLIKLSJKLLMLSSSSNKNLLSQOMSMKNJKHJNNJKNIIOGMA=::))))(()-:;F>>>>;96665/.,---)((//DEHFD987776:<>@BBA@=<<=BC99999DCAESLLKMIIJ==?JLKJJKSSNPKKQSMMIGKILSHFFHGHGSJMKNLOKMLJNKJJSKOKILILSLLKGIJPLQNNGIIQLSOGEGFGHE;;;64+++...+***+34?BCEFFGBBCCDMIEHGE;FBA@?640.../9?BDDFHJODABJCKSJJHHGDEFHGHHHMOKIHIJJGSSJMFKGMSMLLQJJLKNLPSNSCACDBBAA>55888>ABEF?DFHEEDFDGHJHJIIJPJHMJMQKJKLIJJJKLLKHJLSNKISJIJMGKKNPSJMOKQMKNKMIJRPJEHB;;EFFMJLGJ==;9;;KIJHKGHIHJOKFFFGFGIHIIJLFDEDEKPINIJOMIHHIHGIHSOSHKFJJHKH556A600FHGIKHJBBEGFHLGHGGFGHGHINJSIFIGKHJMINKKIJFDACDCCCABDHGQISHJKHGPHOLNJGIINJNI?=<<;??=><=333658986::ABGJJHJKFJIJJGIEEHIDBEFGGJHGGJHILJEFDGFIGEIJJHMKGJGHKMGHGFHFIE@BADEJJEGFLGQJSIJKGIIDGFHHGDACCHEEDDCELKSIFEFGJGHFCCDGHDDFEGEGDCCDIEJHFCCDEBB@@AEEHHCCBEIGIFFFHHHECFFHHOIGGEHKJHCBDDLGGFHGIHPKJLJIGKFHFEDDCGFDDDDHJLGFFEFGLKPGDEFFFOJGFDFKIJOISDB@BCDEJKJIECGFGLKHDBDDHPMGGFHLMLJFDDDFHJIJLKKFECCHFCJJLFCDDSJHFFDDDEGJHGFCEIKJPOLFDCEGIJONGHLHGFGHOJHFEDEIJHNKSLSOJJHPHHHECBCBDDGIJMGSHILJIJMHQJHHIMIA????BNLNMOJKJKHJIEEHCDJGRIIJSQKLIHJFGGEEJHKHLPOIJHHHFEHHIFOHFCA@BA?KHMJNKKSPGDCDAABCCGGJJJRIMGKJHGFHGINOKHICCCCCEFEKMMIGJJIGEFGEDCEJMOKIHBAA@ABBAFFEHJLMKGEFRLNIKJMILJLLJIFEFFEKHIEIMLIRIHKHKKJLKJKJKHIGIGEDBHFD?55JIHGHKHKKMIEDFIJGNNRIIGFFGIJSKJMGRSKISMHECD81HIMJSPKHFEGFEHHCCFKLJJMKGEFEGPJMSSSJIMLLKOJHGFHFMIKHFDBA@FFINISLNKISNHKHSSKLRIIGFEEGIGDCBABBJMLNQOMKNGNIIIGKSKIIGFFFFMSKIKNLJKJJHREFEEGSSPSSPSGGKJGLJHCEEEGSSSQJLILLHOJMJIJFDCCBFHEDDDCIONRSNMGIIHILSIMIMIOMIKPLLMEF>:9761100...,33CEHDEJJSSSSQSPSSSSSSSMPSKSSNQSLIIMPNIIHJIIGSLRNNJLSMKQLPOSSRSKLISSOMSQOSGESMKJLKLSNSSRIMRLQOSSOMRSMSSIMLQLNKLLSRKJSMNSMMLLGQSKFS>:::;<HMQNNQMJJLNMSIGFNQRSSSNSKSKRSSOQSSSSSSHHHFHIQSSLSJLQJSBKLQOKNIJPSLMQRJNJSKSJLFGGFGMJJDEEDCSRSQSGFCLOJLLOKJIHJHIGNISSMSJSKNSSMROQSJSISPSSKSQSRHKNBCDCEJLPSKMJSKSNGKI@=BCC?MHJMKQ@?????FKISSSKLISSSSIOKNMKLMKOOKHMSHJHHFFFJGFKKSMJGMGFHIHMMMGGSGLJLD>?;5689:<BILIILLEIEEFEGKJHIHKILKRLSOSKOSOSSOSJJQSNOSKNHJMJMJMMHONSSSLLJSLLNKSLSSLNJLHGHMSMSOLJSPSLSSSKMOKD?@?@@SSSKMIJOORNSOCBB;;<==RHRSDEDEESJSLSPKJSMSNRSJMLJHJJSSSNSNKSNOLKIMRLKSKSSSSSKPMKNJKJSSSLPLKJSQSJOSKPSKOLSQILROLMLIHOMHADDDFISOSSPJIEGGHHJIOC77QRSNMLMIRKSSKSLRMKIIMIKLIQKMJKMJSQSPMNMKKCA818@9::;HLSHNIKOSSLHQHNNKSPNPKKJNSMJLLIJIJJOILGEGFD@<<<2.....CGHABCBGGMHJIJEHD><>=9>@@AAFHI@<=*DHEFFMSMSKHHIKJJGHIQPSKJOKIJJLNHIIJLLJJPKIHJE>===8:J2INICHIOLHBABLIIIJSONSDSSLQKONJJHEDDFLONJLKISSSMLJHIKIILGQSIMKQJSSIIPQIJLGHEIKHIRIRSLSQRKRSSSSJKJLGKIJNIJKFIEG?>???BBCDFGDGGINLKIKHCC@@<921111JHLKGGEEDHHIKLLIIGGHKKHIH=>GJGKHKSNKJHSMKKKMHIFHGIILMIKKKINKIIGKOSKJLJGFHFGFHIFHIGGLILMJIIHKJC@>==?HCSJJFIEHEFFA@?ABIKLKIFAJKFIJ=====HIKIOJGJKLJGGEEFFFHJFHGHJFDBCBBDEDDA/.&$	qs:i:27	du:f:7.2768	ns:i:36384	ts:i:10	mx:i:2	ch:i:1646	st:Z:2023-10-11T01:29:42.879+00:00	rn:i:377089	fn:Z:1646.pod5	sm:f:-650.938	sd:f:0.0072371	sv:Z:pa	dx:i:0	RG:Z:ca267ec1640e38bd1ec507d001335067967de6d4_dna_r10.4.1_e8.2_400bps_sup@v4.3.0
000a7f54-f079-4909-b056-324ac877d212	4	*	0	0	*	*	0	0	GTTATGTTGCATCTTACTTCGTTCAGTTACGTATTGCTATTGTGTAAGAACTTTGGCTTTTACTCTGAGTAAAATGGGGAGCCATTGGAGTGTTTGTGAACAGAGGAGTGACATGATCTCATCTTTTAAAAGGATTACTCTTACTTCTGTGTTGATAATAGACTATAATAGTGAAAACAGAAAGACCGATTAGTTACAATATTGCTATACTACAGAAAAGAGATAGTGATGACTTGGAAAGTGGTGGAGGAAGTAAAACGTGATCTGAATCTGGATGTAACGTGAAGATAGAGCAGGCAAGGTTTGTTGACAGATTAGATTTGCTATTTGAGAGAGAGCATAACTACAAAGTTTTTGGCTTGAGCCCCTGAAGAGATCGATTTGTTATTCTTTTATTAGGATGGGCAAGACTGTGGGTAGAGTAGGTTTTTGGAGGAAGATCAGGAGTTTAGTTTGGGGCATGTTAAGTTTGGCATGTCTAGTAGACACCCCATTGAAACTACTTTAGGTGTACTTTGTTGAAGTACATCCTTAGAATTTCTTTTGGTGAGAGTCTCTGGAAGGCAGACTTAATCTTTGTATATTTCAGATTGTCTTTTTCTTACCTATGCTTTTGAATGTACTTTAACGGGGTATACAATTCTGGTGGACAGTTGTTTTCCTTCACTGCTTTTGTTATTGCCAGTGAGAAACTTATGTTGGTATAATTGATGTTCCTTTGTAGATCTCCAGTTTTTTTTTCCTTTAAATCTCCTGTGGCTTGTAAGTTTTTTTATTAATCACAATTTTGCTATGCTTTTTCCAGGTATGGAACTATCGTTATGTATATACACACACATACTTTATGGTTTTTTCATTCTGTATACTGTTTTGGGTAAATTTCCCAGAAAACTATCTTCTAAGTCACGAATTCGTTAGAATTCATCTTATCTGTTTTATATGTGTGTGTGTATATATAGTTTATATATATTATATATAAAATATTATATATTATATATTATATATAAAATATTATATATTATATTATATATTATATATAAAATATTATATATTACATATTATATATTATATATAATATATTATATATTATATATTATATATTATATATAATATATTATATATTATATATTATATATAATATATTATATATTATATATTATATATAATATATAATATATTATATATTATATATAATATATTATATATAATATATAATATATTATATATAATATATAATATATTATATATTATATATAAAATATATTATATATAAAATATATTATATATAATATATATATTTATATATATTATATATAAAATATATTATATATAAAATATATATATTTATATATATTATATATAAAATATATAATATATATAATATTATATATTATATATAAAATGTTATATATTATATATAAAATATTATATATTATATATAAAATATACGTAAAATATATATTATACATAAAATATATATTATACATAAAATATATATTATACATATTATACATAAAATATATATTGTACATATTATTATATATAAAATATATATTGTACATATTATTATATATAAAATATATATTGTACATATTATTATATATAAAATATATATTATACATATTATATATAAAATATTATATATTATACATATTATATATAATATATATTTATATATATATTATATATATTATATATAATATATAATATATTATATATTATTATATACAATATATATATTATATATAAAATATACATATATAAAATATATAAAATATATATAAAATATACATATATAAAATATACATATATGTTATGTATATTATTTATATTGTGTATAAAATATACATATTATGTATATTATATATTATATATAAAATATACGTATATATTATGTATATTATATATATTATATATTATATATATATTTTTTAAGACAGGGTCTCACTCTGTCACCCAGGCTGGAGTGCCTTGGCATGATCTCAGCTCACTGCAACCTCTGCCTCTTGGGCTCAAGCGATCCTCCCACCTCAGCCTCCCAAGTAGCTGAGACTACAGGCAAGTGCCACCATGCCCAGCTAATTTTTGTATTTTTTTGTAGAGATAGTGTTTCACCATGTTGCCCAGGCTGGTCTCAAACTCCAGAGCTCACGTGATCTGCCTGCCTCGTCCTTCCAAAGTGCTGGGATTACAGGCGTGAGCCACCGTGCTTGGCTGAAAGTACACTTTTATTTCTAGTCTTTTTTATTTTTATACATGCCTGTTCTTATTTCACTACGGCTTGTTTTTCATGATTTCTTGTTCTTTATTTTTTTTTTTTTTTAAGGGACAGTGTCTCACTGTGTCATCCACACTGGAGTCTAGTGGTGTCATCATAACTCACTGTAGCCCTGAATTCCTGGGCTCAAGTGATCTTCCTGCTTTAGCCTCCTGAGTTGCTGAGACTACAGATGTGAGCCACCATGCCCAGTTAACTCTTCTTAAATAGAAGTTATTCCCTTATTTATCACTTCAGTATCCTAAACATAGTTATTTTAAGGTTTTTGTCAGACTGCTCCATAAAATTAGTTTCATATAGAGTGAATGTGTGTTATTACTGATTTTTTTAGGTTGTCTTTCTTAATATAAAATTTATTTCTATATTTTGAATTTTGGCTCACTGGCTATGATGGGAGCCTTTTGTTTACTGATCTCTTTCTAGGCTCTTTCCTCTCTATCAGAGGAACTAGATTAGGTAGTGGTATTTTGGGTGTCCCCCTTCACAGTGAAATTGGGGATATCATAGATCTAGTCACTGAGTTAGTAGTCAGCCTGATTCAGAGGCCCCCTGTGCCCTTGGCTTTTCCTACTGCTACAGCCTCAGGCAGCTCTGGCAGTAGCCCTTTTTCAGGTTCTTCTCAAAAGAGATGAGGCTGCTGAAATTCTTTGGCTGCCACTACTGCTTCTAAATCTGTAACTCTTGGGTCTCTGGTATCAGCACACTCACTACTTTGCATTTCTGTTCATTTTTACCCAGATGATTATCTTGCTTTTGGACTCAGCTGGTTTTGGGGAGCAATACGTTG	')))((+)&%&&&'''').--.//2766555569DDDJIEDB@=>>DDGJJHRSMSLNSLSSSSORNLKNSRSKPLJLSSQIIKIIPNPLLKSSSSLJNLOPQSINMKSSSSSSKMSRSHPJFIBDEABJFAJJIGFDFDHFEELLQKOMQHIHGGSJSKSPSKKSPSSIIKFKJLA<JGJIJSMSMSIOLPHMSMKLKSJSLJRPSOONJLSOKLMPNMOSSQLNOMSMMMKIKHHNMPNO;998999CCCSIQMMOLOSQHSMIMQKKNSKSSRMSONJJLLOLLMJMLKMMSSSSRKMKSNSRSQSSQNLKSSMMJSINLOSNSNI?=92;7GFLSKKNSOSNOSSSKSSSSSPJHNPOSSLKIIE??@IMNJMJSKKRLOSSMLSLSLILJJEBBFSLOIOSJSJSRMCBCQKKJSJNJJSKHHEPIJIMJJSMSSSLIGGJKPSOSPSHSMSRMISOOQPKKISMSQSSOSSPLMJILLIKLSJKLLMLSSSSNKNLLSQOMSMKNJKHJNNJKNIIOGMA=::))))(()-:;F>>>>;96665/.,---)((//DEHFD987776:<>@BBA@=<<=BC99999DCAESLLKMIIJ==?JLKJJKSSNPKKQSMMIGKILSHFFHGHGSJMKNLOKMLJNKJJSKOKILILSLLKGIJPLQNNGIIQLSOGEGFGHE;;;64+++...+***+34?BCEFFGBBCCDMIEHGE;FBA@?640.../9?BDDFHJODABJCKSJJHHGDEFHGHHHMOKIHIJJGSSJMFKGMSMLLQJJLKNLPSNSCACDBBAA>55888>ABEF?DFHEEDFDGHJHJIIJPJHMJMQKJKLIJJJKLLKHJLSNKISJIJMGKKNPSJMOKQMKNKMIJRPJEHB;;EFFMJLGJ==;9;;KIJHKGHIHJOKFFFGFGIHIIJLFDEDEKPINIJOMIHHIHGIHSOSHKFJJHKH556A600FHGIKHJBBEGFHLGHGGFGHGHINJSIFIGKHJMINKKIJFDACDCCCABDHGQISHJKHGPHOLNJGIINJNI?=<<;??=><=333658986::ABGJJHJKFJIJJGIEEHIDBEFGGJHGGJHILJEFDGFIGEIJJHMKGJGHKMGHGFHFIE@BADEJJEGFLGQJSIJKGIIDGFHHGDACCHEEDDCELKSIFEFGJGHFCCDGHDDFEGEGDCCDIEJHFCCDEBB@@AEEHHCCBEIGIFFFHHHECFFHHOIGGEHKJHCBDDLGGFHGIHPKJLJIGKFHFEDDCGFDDDDHJLGFFEFGLKPGDEFFFOJGFDFKIJOISDB@BCDEJKJIECGFGLKHDBDDHPMGGFHLMLJFDDDFHJIJLKKFECCHFCJJLFCDDSJHFFDDDEGJHGFCEIKJPOLFDCEGIJONGHLHGFGHOJHFEDEIJHNKSLSOJJHPHHHECBCBDDGIJMGSHILJIJMHQJHHIMIA????BNLNMOJKJKHJIEEHCDJGRIIJSQKLIHJFGGEEJHKHLPOIJHHHFEHHIFOHFCA@BA?KHMJNKKSPGDCDAABCCGGJJJRIMGKJHGFHGINOKHICCCCCEFEKMMIGJJIGEFGEDCEJMOKIHBAA@ABBAFFEHJLMKGEFRLNIKJMILJLLJIFEFFEKHIEIMLIRIHKHKKJLKJKJKHIGIGEDBHFD?55JIHGHKHKKMIEDFIJGNNRIIGFFGIJSKJMGRSKISMHECD81HIMJSPKHFEGFEHHCCFKLJJMKGEFEGPJMSSSJIMLLKOJHGFHFMIKHFDBA@FFINISLNKISNHKHSSKLRIIGFEEGIGDCBABBJMLNQOMKNGNIIIGKSKIIGFFFFMSKIKNLJKJJHREFEEGSSPSSPSGGKJGLJHCEEEGSSSQJLILLHOJMJIJFDCCBFHEDDDCIONRSNMGIIHILSIMIMIOMIKPLLMEF>:9761100...,33CEHDEJJSSSSQSPSSSSSSSMPSKSSNQSLIIMPNIIHJIIGSLRNNJLSMKQLPOSSRSKLISSOMSQOSGESMKJLKLSNSSRIMRLQOSSOMRSMSSIMLQLNKLLSRKJSMNSMMLLGQSKFS>:::;<HMQNNQMJJLNMSIGFNQRSSSNSKSKRSSOQSSSSSSHHHFHIQSSLSJLQJSBKLQOKNIJPSLMQRJNJSKSJLFGGFGMJJDEEDCSRSQSGFCLOJLLOKJIHJHIGNISSMSJSKNSSMROQSJSISPSSKSQSRHKNBCDCEJLPSKMJSKSNGKI@=BCC?MHJMKQ@?????FKISSSKLISSSSIOKNMKLMKOOKHMSHJHHFFFJGFKKSMJGMGFHIHMMMGGSGLJLD>?;5689:<BILIILLEIEEFEGKJHIHKILKRLSOSKOSOSSOSJJQSNOSKNHJMJMJMMHONSSSLLJSLLNKSLSSLNJLHGHMSMSOLJSPSLSSSKMOKD?@?@@SSSKMIJOORNSOCBB;;<==RHRSDEDEESJSLSPKJSMSNRSJMLJHJJSSSNSNKSNOLKIMRLKSKSSSSSKPMKNJKJSSSLPLKJSQSJOSKPSKOLSQILROLMLIHOMHADDDFISOSSPJIEGGHHJIOC77QRSNMLMIRKSSKSLRMKIIMIKLIQKMJKMJSQSPMNMKKCA818@9::;HLSHNIKOSSLHQHNNKSPNPKKJNSMJLLIJIJJOILGEGFD@<<<2.....CGHABCBGGMHJIJEHD><>=9>@@AAFHI@<=*DHEFFMSMSKHHIKJJGHIQPSKJOKIJJLNHIIJLLJJPKIHJE>===8:J2INICHIOLHBABLIIIJSONSDSSLQKONJJHEDDFLONJLKISSSMLJHIKIILGQSIMKQJSSIIPQIJLGHEIKHIRIRSLSQRKRSSSSJKJLGKIJNIJKFIEG?>???BBCDFGDGGINLKIKHCC@@<921111JHLKGGEEDHHIKLLIIGGHKKHIH=>GJGKHKSNKJHSMKKKMHIFHGIILMIKKKINKIIGKOSKJLJGFHFGFHIFHIGGLILMJIIHKJC@>==?HCSJJFIEHEFFA@?ABIKLKIFAJKFIJ=====HIKIOJGJKLJGGEEFFFHJFHGHJFDBCBBDEDDA/.&$	qs:i:27	du:f:7.2768	ns:i:36384	ts:i:10	mx:i:2	ch:i:1646	st:Z:2023-10-11T01:29:42.879+00:00	rn:i:377089	fn:Z:1646.pod5	sm:f:-650.938	sd:f:0.0072371	sv:Z:pa	dx:i:-1	RG:Z:ca267ec1640e38bd1ec507d001335067967de6d4_dna_r10.4.1_e8.2_400bps_sup@v4.3.0
000a7f54-f079-4909-b056-324ac877d212;c2770385-48c0-4796-8763-49cbfa4cde6d	4	*	0	0	*	*	0	0	GTAAGAACTTTGGCTTTTACTCTGAGTAAAATGGGGAGCCATTGGAGTGTTTGTGAACAGAGGAGTGACATGATCTCATCTTTTAAAAGGATTACTCTTACTTCTGTGTTGATAATAGACTATAATAGTGAAAACAGAAAGACCGATTAGTTACAATATTGCTATACTACAGAAAAGAGATAGTGATGACTTGGAAAGTGGTGGAGGAAGTAAAACGTGATCTGAATCTGGATGTAACGTGAAGATAGAGCAGGCAAGGTTTGTTGACAGATTAGATTTGCTATTTGAGAGAGAGAGCATAACTACAAAGTTTTTGGCTTGAGCCCCTGAAGAGATCGATTTGTTATTCTTTTATTAGGATGGGCAAGACTGTGGGTAGAGTAGGTTTTTGGAGGAAGATCAGGAGTTTAGTTTGGGGCATGTTAAGTTTGGCATGTCTAGTAGACACCCCATTGAAACTACTTTAGGTGTACTTTGTTGAAGTACATCCTTTAGAATTTCTTTTGGTGAGAGTCTCTGGAAGGCAGACTTAATCTTTGTATATTTCAGATTGTCTTTTTCTTACCTATGCTTTTGAATGTACTTTAACGGGGTATACAATTCTGGTGGACAGTTGTTTTCCTTCACTGCTTTTGTTATTGCCAGTGAGAAACTTGTGTTTGGTATAATTGATGTTCCTTTGTAGATCTCCAGTTTTTTTTCCTTTAAATCTCCTGTGGCTTGTAAGTTTTTTTATTAATCACAATTTTGCTATGCTTTTTCCAGGTATGGAACTATCGTTATGTATATACACACACATACTTTATGGTTTTTTCATTCTGTATACTGTTTTGGGTAAATTTCCCAGAAAACTATCTTCTAAGTCACGAATTCGTTAGAATTCATCTTATCTGTTTTATATGTGTGTGTGTATATATAGTTTATATATATTATATATAAAATATTATATATTATATATTATATATAAAATATTATATATTATATTATATATTATATATAAAATATTATATATTACATATTATATATTATATATAATATATTATATATTATATATTATATATTATATATAATATATTATATATTATATATTATATATAATATATTATATATTATATATTATATATAATATATAATATATTATATATTATATATAATATATTATATATAATATATAATATATTATATATAATATATAATATATTATATATTATATATAAAATATATTATATATAAAATATATTATATATAATATATATATTTATATATATTATATATAAAATATATTATATATAAAATATATATATTTATATATATTATATATAAAATATATAATATATATAATATTATATATTATATATAAAATGTTATATATTATATATAAAATATTATATATTATATATAAAATATACGTAAAATATATATTATACATAAAATATATATTATACATAAAATATATATTATACATATTATACATAAAATATATATTGTACATATTATTATATATAAAATATATATTGTACATATTATTATATATAAAATATATATTGTACATATTATTATATATAAAATATATATTATACATATTATATATAAAATATTATATATTATACATATTATATATAATATATATTTATATATATATTATATATATTATATATAATATAATATATTATATATTATTATATACAATATATATATTATATATAAAATATACATATATATAAAATATATAAAATATATATAAAATATACATATATAAAATATACATATATGTTATGTATATTATTTATATTGTGTATAAAATATACATATTATGTATATTATATATTATATATAAAATATACGTATATATTATGTATATTATATATATTATATATTATATATATATTTTTTAAGACAGGGTCTCACTCTGTCACCCAGGCTGGAGTGCCTTGGCATGATCTCAGCTCACTGCAACCTCTGCCTCTTGGGCTCAAGCGATCCTCCCACCTCAGCCTCCCAAGTAGCTGAGACTACAGGCAAGTGCCACCATGCCCAGCTAATTTTTGTATTTTTTTGTAGAGATAGTGTTTCACCATGTTGCCCAGGCTGGTCTCAAACTCCAGAGCTCACGTGATCTGCCTGCCTCGTCCTTCCAAAGTGCTGGGATTACAGGCGTGAGCCACCGTGCTTGGCTGAAAGTACACTTTTATTTCTAGTCTTTTTTATTTTTATACATGCCTGTTCTTATTTCACTACGGCTTGTTTTTCATGATTTCTTGTTCTTTATTTTTTTTTTTAAGGGACAGTGTCTCACTGTGTCATCCACACTGGAGTCTAGTGGTGTCATCATAACTCACTGTAGCCCTGAATTCCTGGGCTCAAGTGATCTTCCTGCTTTAGCCTCCTGAGTTGCTGAGACTACAGATGTGAGCCACCATGCCCAGTTAACTCTTCTTAAATAGAAGTTATTCCCTTATTTATCACTTCAGTATCCTAAACATAGTTATTTTAAGGTTTTTGTCAGACTGCTCCATAAAATTAGTTTCATATAGAGTGAATGTGTGTTATTACTGATTTTTTTAGGTCGTCTTTCTTAATATAAAATTTATTTCTATATTTTGAATTTTGGCTCACTGGCTATGATGGGAGCCTTTTGTTTACTGATCTCTTTCTAGGCTCTTTCCTCTCTATCAGAGGAACTAGATTAGGTAGTGGTATTTTGGGTGTCCCCCTTCACAGTGAAATTGGGGATATCATAGATCTAGTCACTGAGTTAGTAGTCAGCCTGATTCAGAGGCCCCCTGTGCCCTTGGCTTTTCCTACTGCTACAGCCTCAGGCAGCTCTGGCAGTAGCCCTTTTTCAGGTTCTTCTCAAAAGAGATGAGGCTGCTGAAATTCTTTGGCTGCCACTACTGCTTCTAAATCTGTAACTCTTGGGTCTCTGGTATCAGCACACTCACTACTTTGCATTTCTGTTCATTTTTACCCAGATGATTATCTTGCTTTTGGAC	((<<:9887653333;>>=<>==::::@=GSGDCFSFHFGEDSSCCBB@??>=>>SDDENSIDBCAGEGBEEBFCCCFSDEGFBAAGABSBF>>>=@<:8687677?RDECBBBBFQESFBBACADDDEEFDSDOBBBEDEBBCCEGGECASESCBB=;54449?@ABBB@B?CEREFCDGDGCGDAJFCCFDCGSDSCFAEDA@DGCSEICDFSDEB@@BD>>>>IBBDEGHSCFCBEFSJDCFCGFFCCDBAAC@ABADJAESSDDHGCECJDKASEGBS@@?BCABBB??@E?BIBDGBBDDESEFDDSGSSEBHDSFDDDBFSPBBDIDDSSCA?DBGCDCJ?@EDCESCCB?ISSDDSSNGCBE@?999:GBBBB???@@GACSSCISDEDEDSHFDIDDCCCB?=@B@CDISC?=?>AEDDDEFNDB?A>@E=<?@ACSB@BEESHICSDBFECSFDCNIFSDCB@CEDBBCEBFSECHFDSSGBDCAGAHSESHDESSDECHB?>=?=;95457C@CCDKHJCFFEEBBDDC@CEFEMSDSLSGGBA=433@ECA?<;6579:BSEDSMGSSKDBDIHDGCGHIESDDFDDBFSIB?@@AHDBHSSLEHDBSSFDCCB=>>=CDDESECSSGSS?>>>IBCDIGEMFA@???>>;@<?;<==IA@C@BAE>;;CACDS@BCBDD@BFBDSSSSSCAAACBEDSFDDDSDSSJE@9879;;;<@CA>:<<;?DAGABCCIDGSSF@D?@>>C@SSKDAAFDSCSCCDFFFCJCCHAABB@ABNSC??>MEFSGIBJJSHKGSSDFSSSEBEAFCDCCDA@>>>>ESMSADDAABCGDSECCBSDSDFFHCPILEEDEEDBBDSBABBBCDIFEBSRJIIFJCEFSFEACGDE@?DEBSSFSOCA<;==CAA?AIA><621255777=ABB;>ECSDCFAECBFSESFFDFCECDCBGHDCDDSEEC@?ADACJE@CSJDHEEEFKDBFCDD?@ADCA>88??BDEB?C==<;889<>ACD<68:;;969:>CSD@9?ABDD>34687:9::;;<@;8<A=AA=78>C>?;89;::977;@BCC?<9:<==:99<@@A<622/.----01265324.+-,3359778<:>EAD;346669979>>=99878:743259?SBACCCMDFBDLAHDKFCDBBFG@C>CICA?BABCCAA:>FB>:947SCSE@ABBCBDEDAAFCCFCACIAFCFSCBDCDGSC@B>A=<;76642//1478=>F=5:SDIEESEDDSDFSS@BDBFDHBDBB@BADSIJ?==:;;AAFSCD=@HFGSCCBSCSSDDAFFCSFFDFASECA@EAACCDD<>BAEABFSSDFDAEBCCKDF@BBCSSCEFSGFEHIGHCECH?BSCSIAGCSFFGBBISADFSCACFSCAA<??DAEBEJBSHSSFSESACEB;<=CESDDODJSCS>BA?==<889<<=<<>@ESIBHDECA?=<==@CBGS><;8<AFFACJHSDIDBDD?98779==B<<?BE?DEEBGSGFSDHCDSBB@CA@BSSDMI@DOA@IQCCCECSEHSCESOCHIEA<=>DCBEQBGFBFESCBCDBCSDDFEHCGIB?@@@>@B?=40;:EHDDB@>==>=?B??:631/0389?CC669<:GAA<;<75?=@CBA<57:;;:76@=9;@BB?999@ABHDDAF=8<DDEAB@SSGESSEACDBCED>BECSCEBGFGKHSIGGDCESCSCSLDDBBACCDCECEHSCSSEDEACFEFGCBC7D>@?@CDDCC@FDDSCSBBFBBJGBSFGCCSMDDSEFDIBEISHSFGSCBBE<=DRFSSGGSEGSCCBFBDNSFDCDAA@ABFDC=?SCBSC?=:;@B>;;:;?BESEGG;;?SDBBDAEB@FJDDBCCD><@0///DASHEEC@>@?SDEBA>>>CGGSC@@BFDCBCHDGCBCA@ACFBSBAACGAAAIDDGKDBDBDDCA@C?<==DAAA@IEFCBCGDDHI?ASDDESBPSGECJGSSGEDCSDCECFQD?A?@AGBCFGGCEFSSE>;:::7421+*+59=?@DHBCCDSCCSSBBBCDEESD@DFCBEDJIIKCDADHSDDDGCDJBA@@?;;9:0../::>@AHCGCEBDAB?>@?AC@@A@DFDJIBCJBSENSDBJFJMCDDSSEEDELSAEIEEBA>=<<@=>756A?===>?:99SSS=9;AAB=;;@?DB@==<<?HEACB@GDCSGBA@DJEBBIFFSSSBCC=ECDCCHBC@SDID>99<<CA@>@><=<=DCCEGBIDSCSDGDDCSEECCBEEDSEEBCFFCCBJHCHKSJSHIH@<:ABGASSEMBD@BEECSG?DFSGGFCDDDFESDSSCAE<AACBECB@@>>?>>;;;:AFCGBDB7755777<<<=ACESIDEACFCSSBSBDF@@==@CJ@DFEGFFSESFDEKSEBDSBFEA>=<<>BDGBA@=69?A@A>====;;;CGDEGEFBBAHHE@=>>CBDSSDC?=;<<7?AFBS?@LEKSFESCGEDFB>@@<BBABSFEDFCJEBDBBC>><=?>ABBSSD@A6><((();;BC=;=?CF<>BDSIFD?>>AA@?@ACDHSFBCCCFSBGHHFFGMHDDEDCAD?@BFGRBABAEBSFCEGC><==?>=??CH@D@=<9<?AH?I@D86@DOK@EEDEKGDFFBCFCDDFAHA@BC@@BAISJDFHFESCESD???F>>DHJGMBAAAEBSCBBHDBF@BBCAECBHGCCBB???><;<>?FH?>@ACCFFCGCSCLFGDSGSFEFCHGAEAAB<>@CSBBS;84334<>?ALAAAFJCSJCBSCDEGCCCA??@HE<;;SSH@@BDCEEBBAC@>=>@B??C=>?@DDB@??AMCDCFESOEDDDLHD>==>@B?>@@CDACHCE2223@BDGCDFCAD=;;5556F@C@ACHB@;;<<<AA=<<;=:88:=><<BDGC@BCSS?><>=>?CCC@;:99<:469@@DCAA:	qs:i:27	dx:i:1	mx:i:2	ch:i:1646	st:Z:2023-10-11T01:29:42.879+00:00	RG:Z:ca267ec1640e38bd1ec507d001335067967de6d4_dna_r10.4.1_e8.2_400bps_sup@[email protected]

Does this make sense? Can you elaborate a bit on this?

Thanks for your time,

Davide

@tijyojwad
Copy link
Collaborator

tijyojwad commented Mar 22, 2024

@davidebolo1993 This is a super useful analysis, and some of it is certainly not expected.

A simplex read can end up in multiple duplex reads, but it should be a really low number (e.g. in your case is ~0.3%). And in fact we put in some changes to limit each simplex from occurring in at most 2 duplex reads (i.e. it either pairs with a read immediately preceding it or succeeding it). So a bug has crept in here which needs to be looked at.

I thought that 0 and -1 tags were mutually exclusive but it doesn’t seem the case from this example

Indeed should be mutually exclusive. We will take a look at this.

@tijyojwad tijyojwad reopened this Mar 22, 2024
@davidebolo1993
Copy link
Author

Hey @tijyojwad, any updates on this ?

Thanks,

Davide

@tijyojwad
Copy link
Collaborator

Hi @davidebolo1993 - unfortunately not yet, but I'm starting to look into this now. Sorry for the delay!

@tijyojwad
Copy link
Collaborator

This will take a bit more time @davidebolo1993 , but hopefully we'll have some fixes in place for the next major release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants