Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check order when multiple pairs of FASTQs are supplied #467

Open
eturro opened this issue Nov 15, 2024 · 4 comments
Open

Check order when multiple pairs of FASTQs are supplied #467

eturro opened this issue Nov 15, 2024 · 4 comments

Comments

@eturro
Copy link

eturro commented Nov 15, 2024

Hi, thanks for producing this great software. I've a suggestion to add a check of user input.

The documentation states "The default running mode is paired-end and requires an even number of FASTQ files represented as pairs, e.g. kallisto quant -i index -o output pairA_1.fastq pairA_2.fastq pairB_1.fastq pairB_2.fastq"

It seems that the order is critical. If the user accidentally supplies the files in the order pairA_1.fastq pairB_1.fastq ... pairA_2.fastq pairB_2.fastq ... , kallisto runs without issuing any errors, but outputs erroneous quantities.

The suggestion is to add a check of the user input.

@Yenaled
Copy link
Collaborator

Yenaled commented Nov 15, 2024

I don’t really think there is a way to check for that. FASTQ files are just DNA base sequences, and there’s no way for a program to know what file should be paired with what.

@eturro
Copy link
Author

eturro commented Nov 15, 2024

Can't you just check the read names (at least the first few)? They should match between files in a pair.

@Yenaled
Copy link
Collaborator

Yenaled commented Nov 15, 2024

That would be possible — I often work with FASTQs with altered read names, but perhaps a warning could be printed out if the names don’t match. Will consider it.

@eturro
Copy link
Author

eturro commented Nov 15, 2024

Alternatively you could check the read counts are the same for the first and second files, the third and fourth files, and so on. But that seems more complicated than simply checking the name of the first read from each file, which would normally give the pattern readA readA readB readB readC readC ...

I'm surprised the program doesn't fail due to the pairs of files being considered to be a pair not having the same numbers of reads actually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants