Add sequence to exported GFF3 #281

kyostiebi · 2023-09-14T06:36:49Z

FASTA is now included in exported GFF3. Please, merge this only after Bug_251_Download_not_working has been merged.

packages/apollo-collaboration-server/src/features/features.service.ts

garrettjstevens · 2023-10-04T03:16:18Z

Has this been tested on exporting very large data? Looking at the code, it looks like it's loading all the sequences into memory, which we want to avoid because we want to keep server memory usage low, and it will be slower. I think for the sequenceStream it will need to use a pipeline like the featureStream does. Just as an outline, that might look like this:

pipeline(
  this.refSeqChunksModel
        .find({ refSeq: refSeqDocs }) // get chunks for all refSeqs in a single query
        .sort({ n: 1 }) // make sure the sort accounts for having all refSeqs
        .cursor(),
  new Transform({
    construct(callback) {
      this.printFasta = true;
      this.printSeqName = true;
      this.remainingLastLine = ''
      callback();
    },
    transform(chunk, encoding, callback) {
      // basically the same logic that's already there, but using `this.push` instead of `sequenceStream.push`
      callback();
    },
    flush(callback) {
      if (this.remainingLastLine) this.push(remainingLastLine)
    },
  }),
  (error) => {
    if (error) {
      this.logger.error('GFF3 export failed')
      this.logger.error(error)
    }
  },
);

The node docs on streams might be helpful here: https://nodejs.org/docs/latest-v18.x/api/stream.html

kyostiebi added the bug Something isn't working label Sep 14, 2023

kyostiebi requested a review from garrettjstevens September 14, 2023 06:36

kyostiebi self-assigned this Sep 14, 2023

garrettjstevens requested changes Sep 20, 2023

View reviewed changes

packages/apollo-collaboration-server/src/features/features.service.ts Outdated Show resolved Hide resolved

packages/apollo-collaboration-server/src/features/features.service.ts Outdated Show resolved Hide resolved

kyostiebi requested a review from garrettjstevens September 21, 2023 08:05

kyostiebi added 4 commits October 24, 2023 18:04

Added sequence data into exported GFF3

048b4a5

FASTA is now included in exported GFF3

b54da52

Fixed small format issues

d887469

all sequence data lines are now same lenght in exported GFF3

fd57b0c

garrettjstevens force-pushed the Bug280_Sequence_data_missing_exported_GFF3 branch from dd856e1 to b09dd8d Compare October 25, 2023 00:48

garrettjstevens marked this pull request as ready for review October 25, 2023 00:48

garrettjstevens approved these changes Oct 25, 2023

View reviewed changes

Use more streams

699a5e8

garrettjstevens force-pushed the Bug280_Sequence_data_missing_exported_GFF3 branch from b09dd8d to 699a5e8 Compare October 25, 2023 01:00

Fix test

6cd8da9

garrettjstevens changed the title ~~Bug280 sequence data missing exported gff3~~ Add sequence to exported GFF3 Oct 25, 2023

garrettjstevens merged commit c5d7c0a into main Oct 25, 2023
4 checks passed

garrettjstevens deleted the Bug280_Sequence_data_missing_exported_GFF3 branch October 25, 2023 02:00

garrettjstevens mentioned this pull request Nov 16, 2023

Sequence data is missing in exported GFF3 file #280

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sequence to exported GFF3 #281

Add sequence to exported GFF3 #281

kyostiebi commented Sep 14, 2023

garrettjstevens commented Oct 4, 2023

Add sequence to exported GFF3 #281

Add sequence to exported GFF3 #281

Conversation

kyostiebi commented Sep 14, 2023

garrettjstevens commented Oct 4, 2023