Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add capability to parse auxiliary target file #469

Open
rob-p opened this issue Dec 31, 2019 · 3 comments
Open

Add capability to parse auxiliary target file #469

rob-p opened this issue Dec 31, 2019 · 3 comments

Comments

@rob-p
Copy link
Collaborator

rob-p commented Dec 31, 2019

This issue continues the discussion started in COMBINE-lab/pufferfish#8, implementing the feature discussed there. I'm tagging @mdshw5 to move conversation over to this issue. As of commit 7a37e8b, we have an --auxTargetFile option that can be passed to the quant command. The format of this file is one target per line. Every target listed in this file that is part of the index will be marked as auxiliary. This means that, while these targets will be quantified, they will not have the auxiliary models applied to them; specifically, sequence-specific, fragment-GC and position-specific bias models will not be applied to such targets. However, unlike decoy targets, they will be quantified and will appear in the quant.sf output.

@rob-p
Copy link
Collaborator Author

rob-p commented Dec 31, 2019

In addition, the ids (index in the ordered list of transcript names) that are treated as auxiliary targets in a given sample are written to the file aux_info/aux_target_ids.json. This allows tracking, in a given sample, which targets were treated as auxiliary.

@mdshw5
Copy link
Contributor

mdshw5 commented Dec 31, 2019

Writing the targets to JSON is probably a good idea. My only remaining concern would be how to make it as easy as possible to compare these files against each other:

  • Does target order matter? If so, JSON parsers will almost always make an unordered object during deserialization.
  • Will a user just be able to diff or hash these files against each other? If so I’d say that’s the best solution.

@rob-p
Copy link
Collaborator Author

rob-p commented Dec 31, 2019

Great questions:

  • The order of the ids in the aux_target_ids.json file will always be in the order the targets appear in the original index. That means that, if the samples were quantified with the same index (which could be checked by the SeqHash values already in meta_info.json, then the ids in aux_target_ids.json will have the same interpretation and always be directly comparable. Then, one could diff the ids to ensure that precisely the same set of targets were treated as auxiliary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants