Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sequence to exported GFF3 #281

Merged
merged 6 commits into from
Oct 25, 2023

Conversation

kyostiebi
Copy link
Contributor

FASTA is now included in exported GFF3. Please, merge this only after Bug_251_Download_not_working has been merged.

@kyostiebi kyostiebi added the bug Something isn't working label Sep 14, 2023
@kyostiebi kyostiebi self-assigned this Sep 14, 2023
@garrettjstevens
Copy link
Contributor

Has this been tested on exporting very large data? Looking at the code, it looks like it's loading all the sequences into memory, which we want to avoid because we want to keep server memory usage low, and it will be slower. I think for the sequenceStream it will need to use a pipeline like the featureStream does. Just as an outline, that might look like this:

pipeline(
  this.refSeqChunksModel
        .find({ refSeq: refSeqDocs }) // get chunks for all refSeqs in a single query
        .sort({ n: 1 }) // make sure the sort accounts for having all refSeqs
        .cursor(),
  new Transform({
    construct(callback) {
      this.printFasta = true;
      this.printSeqName = true;
      this.remainingLastLine = ''
      callback();
    },
    transform(chunk, encoding, callback) {
      // basically the same logic that's already there, but using `this.push` instead of `sequenceStream.push`
      callback();
    },
    flush(callback) {
      if (this.remainingLastLine) this.push(remainingLastLine)
    },
  }),
  (error) => {
    if (error) {
      this.logger.error('GFF3 export failed')
      this.logger.error(error)
    }
  },
);

The node docs on streams might be helpful here: https://nodejs.org/docs/latest-v18.x/api/stream.html

@garrettjstevens garrettjstevens force-pushed the Bug280_Sequence_data_missing_exported_GFF3 branch from dd856e1 to b09dd8d Compare October 25, 2023 00:48
@garrettjstevens garrettjstevens marked this pull request as ready for review October 25, 2023 00:48
@garrettjstevens garrettjstevens force-pushed the Bug280_Sequence_data_missing_exported_GFF3 branch from b09dd8d to 699a5e8 Compare October 25, 2023 01:00
@garrettjstevens garrettjstevens changed the title Bug280 sequence data missing exported gff3 Add sequence to exported GFF3 Oct 25, 2023
@garrettjstevens garrettjstevens merged commit c5d7c0a into main Oct 25, 2023
4 checks passed
@garrettjstevens garrettjstevens deleted the Bug280_Sequence_data_missing_exported_GFF3 branch October 25, 2023 02:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants