Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What can we accomplish right now? #8

Open
scharch opened this issue Aug 26, 2020 · 2 comments
Open

What can we accomplish right now? #8

scharch opened this issue Aug 26, 2020 · 2 comments

Comments

@scharch
Copy link
Collaborator

scharch commented Aug 26, 2020

We've identified a few candidate datasets (thanks, @matsohlin). At some point, we will use the benchmarking pipeline to start to understand how different tools approach the potential problems we're looking at, but what can we do in the meantime? Is it useful to process these datasets with some specific tool (Immcantation, SONAR, ...) and look at the results?

@bcorrie
Copy link

bcorrie commented Sep 2, 2020

What would these candidate data sets be? If there is a specific candidate data set (from a specific study or perhaps a "simulated" data set) and it is annotated with multiple tools, it is possible for us to put that in an iReceptor repository for people to access and download it. If someone annotates some data, provides some AIRR TSV and AIRR Repertoire metadata, we can easily load it... Not sure how useful that would be, but we can "accomplish this right now" 8-)

I would hesitate a bit to say we would make it available through the iReceptor Gateway for general searching by the research community, as we don't handle a single data set that is annotated with multiple annotation tools too gracefully on the Gateway at the moment. This could be confusing to the user so we would want to manage that. We are working on it...

@williamdlees
Copy link

We envisage multiple datasets - some simulated and some real-world. Some of them may exhibit problems - read errors, chimerism, and so on. Not that these problems don't exist in other datasets in the wild, but perhaps we should try not to mix them with other datasets in iReceptor+ to prevent them coming up in searches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants