-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability for indexed sequence file to be uploaded and used in an assembly #432
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I didn't think about when we were discussing this before is that any Change
can't have nodejs-specific code in it (i.e. no imports from node:something
). This is because this code has to run both in the browser and on the server. I think the nodejs-specific code that got introduced is what is causing the tests to fail.
What we'll need to do instead is that in packages/apollo-collaboration-server/src/files/files.service.ts
we'll add a getFileHandle
method that returns a file handle from a file document, very similar to how getFileStream
in that same service file returns a file stream from a file document. Then the implementation of LocalFileGzip
will be in collaboration server (e.g. in packages/apollo-collaboration-server/src/files/filesUtil.ts
) instead of the shared code.
Then the code in the Change
will look something like this:
const { filesService } = backend
const { getFileHandle } = filesService
new IndexedFasta({
fasta: getFileHandle(fastaDoc),
fai: getFileHandle(faiDoc),
})
(Don't forget to add the definition for getFileHandle
in packages/apollo-common/src/Operation.ts
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be combined with the add-fasta
command, they should not be separate commands.
Also, add-fasta
and add-gff
are unclear as to their function based on the command name. So what we need to do is:
- Rename
add-gff
toadd-from-gff
- Rename
add-fasta
toadd-from-fasta
- Combine
add-file
intoadd-from-fasta
Another thing to consider, maybe the argument to add-from-fasta
should be positional and not a flag, since it's required in all cases. What do you think of this for example commands:
apollo assembly add-from-fasta ./my-genome.fa
apollo assembly add-from-fasta ./my-genome.fa.gz
apollo assembly add-from-fasta 507f1f77bcf86cd799439011
apollo assembly add-from-fasta ./my-genome.fa.gz --not-editable
apollo assembly add-from-fasta ./my-genome.fa.gz --not-editable --fai ./my-genome.fa.gz.fai --gzi ./my-genome.fa.gz.gzi
apollo assembly add-from-fasta 507f1f77bcf86cd799439011 --not-editable --fai 507f1f77bcf86cd799486197 --gzi 507f1f77bcf86cd785963292
description: | ||
'File type or "autodetect" for automatic detection.\n\ | ||
NB: There is no check for whether the file complies to this type', | ||
options: ['text/x-fasta', 'text/x-gff3', 'autodetect'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to add new options here, since we want to be able to tell the type later. Types I can think of are:
- text/x-fasta
- text/x-gff3
- application/x-bgzip-fasta
- text/x-fai
- application/x-gzi
Authorization: `Bearer ${accessToken}`, | ||
'Content-Type': type, | ||
'Content-Length': String(size), | ||
'Content-Encoding': contentEncoding, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave out Content-Encoding
altogether unless it's gzip, maybe something like:
const headers = new Headers({
Authorization: `Bearer ${accessToken}`,
'Content-Type': type,
'Content-Length': String(size),
})
if (isGzip) {
headers.append('Content-Encoding', 'gzip')
}
}) | ||
|
||
let contentEncoding = '' | ||
if (file.endsWith('.gz')) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pass a param to uploadFile
instead of checking the file name.
const gz = createGzip() | ||
await pipeline(stream, gz, fileWriteStream) | ||
|
||
if (originalname.endsWith('.gz')) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check the Content-Encoding
Content-Type
instead of the filename.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be folded in to the existing AddAssemblyFromFileChange
instead of being a new change.
}), | ||
force: Flags.boolean({ | ||
char: 'f', | ||
description: 'Delete existing assembly, if it exists', | ||
}), | ||
'no-db': Flags.boolean({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no-db
should be something like not-editable
, since that what is relevant to the user (the database is an internal detail they don't need to worry about)
description: wrapLines( | ||
"Do not load the fasta sequence into the Apollo database. \ | ||
This option assumes the fasta file is bgzip'd with `bgzip` and indexed with `samtools faidx`.\ | ||
Indexes should be named <my.fasta.gz>.gzi and <my.fasta.gz>.fai", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Auto-detection is good, but there should also be a way to explicitly pass the fai and gzi files.
But most likely there will be changes to clunky code
@dariober the issue with the GZI filehandle we were seeing has been fixed |
All tests except those for chekcs passing
* CLI can upload and add assembly from gzip files (#405 and #407) * CLI can add assembly in non-editable mode with `--no-db` flag, i.e. without loading sequence to mongodb (#406). Only bgzip'd & indexed fasta file are supported, not plain fa/fai files. See tests in test.py: `testFeatureChecksIndexed`, `testFileUploadGzip`, `testAddAssemblyWithoutLoadingInMongo`
But most likely there will be changes to clunky code
All tests except those for chekcs passing
765ceab
to
1302ce4
Compare
620ac25
to
9d149dc
Compare
7e49495
to
db455a3
Compare
Fixes #405, fixes #406, fixes #407, fixes #424