Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example of Experimental Data to Use as Input For Model #5

Open
alexturcoo opened this issue Jan 26, 2024 · 3 comments
Open

Example of Experimental Data to Use as Input For Model #5

alexturcoo opened this issue Jan 26, 2024 · 3 comments

Comments

@alexturcoo
Copy link

alexturcoo commented Jan 26, 2024

Hello there,

I am currently trying to utilize this model to detect non-B structures from ONT data. I am struggling to understand how the input data should look for experimental data. I am utilizing ONT's most recent open source base caller so my preprocessing workflow for experimental data is different than the albacore + tombo workflow followed in your paper. I had no issues producing the simulated data and through inspecting this simulated data, the data made lots of sense to me. Does this simulated data produced by the simulator relate to how the input data for experimental data should look, in terms of features? Is there any example of how the input data produced from experimental samples should look? This would help me alter my preprocessing to fir my workflow. Please let me know! Thanks

@Marjan-Hosseini
Copy link
Member

Hi,
The input for our model doesn't necessarily have to be preprocessed by Albacore and Tombo. We only need translocation times per base for a region on a chromosome. You can use more recent pipelines like dorado and f5c/nanopolish or any other software that is able to produce per base event times.

@alexturcoo
Copy link
Author

@Marjan-Hosseini. In the simulated training data, there are columns for each forward base, reverse base, and masked base. When I am using experimental data should the training data also include the bases for the negative strand? What if I only have positive strand reads and the matching window on the negative strand are not present. Can I utilize a data frame with forward and reverse strand base translocation times in different rows? What I mean by this. For the same read, it is not always the case that the read is present on both strands. Is it okay to not have reverse base reads in the same row as the forward reads? Thanks.

@Marjan-Hosseini
Copy link
Member

Yes, the training data includes the signal in the reverse strand as well. If the reverse strand is not available you may use the forward signal instead, just to make the input signal as required by the model, but I would not recommend, because probably the model wouldn't perform as expected. I'm curious to see the results if you are doing so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants