-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Update ingest-fastq, allow to match samples against assay table to determine collections names (#198) #203
Conversation
…rmine collections names (#198)
Please format your code with black: |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #203 +/- ##
==========================================
- Coverage 77.84% 77.02% -0.82%
==========================================
Files 98 98
Lines 7330 7434 +104
==========================================
+ Hits 5706 5726 +20
- Misses 1624 1708 +84 ☔ View full report in Codecov by Sentry. |
I can't run this:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've only looked at these parts so far, submitted some comments. As I've never used this function and it's quite complex, someone else should also review.
and in_column.lower() | ||
in re.sub("(Parameter Value|Comment|Characteristics)", "", head).lower() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This re.sub does not remove the square brackets of these column headers. Intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of, I want to remove these part of the column names since they are "invisible" and if people have a column named i.e. 'Comment' then checking for that string as part of the name will make problems. The brackets should not have that issue and removing both them requires a much more complicated regex.
@bobermayer This operator was introduced in Python 3.8, which is at the moment the oldest supported version. |
Added 'FAILED' to LZ status that is okay for usage Linting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me and works as intended
This was initially only intended to solve #198 , but I ended up adding some additional changes / enhancements to other aspects of
cubi-tk sodar ingest-fastq
and other (cubi-tk snappy itransfer
functions).Added Features:
--match-column
flag that enables comparing the contents of the "(??)" match group from the src-regex against any specified column of the assay table of a given project (automatically determined from destination) to determine the (new) 'collection_name' parameter in the destination pattern based on the last material column of the assay (or alternatively another column given via--collection-column
). If this is not used the "(??)" match group is assumed to equal the irods collection names (this mirrors previous behaviour).cubi-tk snappy itransfer
can now take a LZ_path as destination (this was intended based on descriptions in the code but never implemented)cubi-tk sodar ingest-fastq
now fully inherits the possibility of using an LZ_path, LZ_UUID or Sodar porject UUID as a destination from thecubi-tk snappy itransfer
functionscubi-tk sodar ingest-fastq
similarly inherits the possibility to automatically validate & move landing zonesChanged Features:
-m
flag from--remote-dir-mapping
to--sample-collection-mapping
and slightly reduced its scope. Previously this allowed applying a regex to the complete remote irods path (and could in-fact make the path non-viable), which allowed adapting the extracted sample/collection name but also renamed the remote files. Now the regex is applied only to extracted "(??)" match group that determines the collection name (either directly or by matching the final modified values to an assay table column).ingest-fastq
#4 ){collection_name}/raw_data/{date}/{filename}
instead of{sample}/{date}/{filename}
)Removed Features:
--add-suffix
flag since it's only use was adding a string to the end (i.e. after file ending) of the remote filename. This seemed like unwanted behavior to me and the remote filepaths can otherwise be controlled by--remote-dir-pattern
. Renaming of files inside the remote patterns also does not seem to serve any useful purpose.--basepath
flag