-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Create ngsDataGrabber.py This is a copy of ngsDataGrabber_V0.6.py and it has been renamed to something more generic for ngsQC protocol. * Create assemblyQC_data_grabber.py file This file is created in order to extract numbers of unspecified nucleotides from the backend. * Update assemblyQC_data_grabber.py * Bump numpy from 1.21.6 to 1.22.0 in /lib Bumps [numpy](https://github.com/numpy/numpy) from 1.21.6 to 1.22.0. - [Release notes](https://github.com/numpy/numpy/releases) - [Changelog](https://github.com/numpy/numpy/blob/main/doc/RELEASE_WALKTHROUGH.rst) - [Commits](numpy/numpy@v1.21.6...v1.22.0) --- updated-dependencies: - dependency-name: numpy dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> * Created assemblyQC_NCBI.tsv Changes to be committed: new file: data_files/assemblyQC_NCBI.tsv new file: lib/assembly_qc_ncbi_.py * Update requirements.txt Changes to be committed: modified: requirements.txt * Publish v0.9 Changes to be committed: new file: data_dictionary/v0.9/README.tsv new file: data_dictionary/v0.9/core_property_list.tsv new file: data_dictionary/v0.9/non_core_property_list.tsv new file: data_dictionary/v0.9/property_definition.tsv new file: data_dictionary/v0.9/release_notes.tsv new file: schema/v0.9/core/assemblyQC.json new file: schema/v0.9/core/biosampleMeta.json new file: schema/v0.9/core/ngsQC.json new file: schema/v0.9/core/siteQC.json new file: schema/v0.9/non-core/SRA_assemblyQC.json new file: schema/v0.9/non-core/SRA_biosample.json new file: schema/v0.9/non-core/SRA_ngsQC.json new file: schema/v0.9/non-core/ngs_ID_list.json new file: schema/v0.9/non-core/sars-cov-2_lineage_mutations.json new file: schema/v0.9/non-core/uniprot-proteome_*.json Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Jingyue Wu <[email protected]> Co-authored-by: penningtonea <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
- Loading branch information
1 parent
fe0b01f
commit 007324e
Showing
18 changed files
with
3,399 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
This page: V0.9_argos_dict | ||
|
||
property definition | ||
column header data | ||
Property consensus name for data property described in row | ||
Name in Files Alternate names for data property in existing datasets | ||
Data Files A `|` separated list of dataset names where this property is utilized | ||
recommended Ther person or resource that suggested using the property | ||
Description A definition and additional information about the property. | ||
source/type def The data source for obtaining the property. | ||
|
||
core property list | ||
column header data | ||
Property consensus name for data property described in row | ||
Data Object Type The dataset this property is used in | ||
Optional/Required indicates if the property is REQUIRED to hava a valid data row | ||
$id For JSON schema conversion | ||
Title Human readable name for property. Default is the same as property. | ||
Type property type as defined by JSON types | ||
default a default value for property | ||
examples and example for the property | ||
pattern the regular expression evaluation for this property | ||
description A definition and additional information about the property. | ||
|
||
annotation property list | ||
column header data | ||
Property consensus name for data property described in row | ||
Data Object Type The dataset this property is used in | ||
Optional/Required indicates if the property is REQUIRED to hava a valid data row | ||
$id For JSON schema conversion | ||
Title Human readable name for property. Default is the same as property. | ||
Type property type as defined by JSON types | ||
default a default value for property | ||
examples and example for the property | ||
pattern the regular expression evaluation for this property | ||
description A definition and additional information about the property. | ||
|
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
property data_object required id title type default examples pattern description | ||
sra_experiment_id SRA_ngsQC.tsv required #root/sra_experiment_id sra_experiment_id string - SRR9107811 ^.*$ - | ||
sra_run_id SRA_ngsQC.tsv required #root/sra_run_id sra_run_id string - SRX5882434 ^.*$ - | ||
bioproject SRA_ngsQC.tsv required #root/bioproject bioproject string - PRJNA231221 ^.*$ - | ||
biosample SRA_ngsQC.tsv optional #root/biosample biosample string - SAMN11077980 ^.*$ - | ||
num_of_bases SRA_ngsQC.tsv required #root/num_of_bases num_of_bases string - 400293278 ^.*$ - | ||
file_size SRA_ngsQC.tsv required #root/file_size file_size string - 195921414 ^.*$ - | ||
published SRA_ngsQC.tsv required #root/published published string - ########## ^.*$ - | ||
source SRA_ngsQC.tsv required #root/source source string - VIRAL RNA (Genomic DNA|GENOMIC|TRANSCRIPTOMIC|METAGENOMIC|METATRANSCRIPTOMIC|SYNTHETIC|VIRAL RNA|OTHER) - | ||
strategy SRA_ngsQC.tsv required #root/strategy strategy string - WGS (WGA|WGS|WXS|RNA-Seq|miRNA-Seq|WCS|CLONE|POOLCLONE|AMPLICON|CLONEEND|FINISHING|ChIP-Seq|MNase-Seq|DNase-Hypersensitivity|Bisulfite-Seq|Tn-Seq|EST|FL-cDNA|CTS|MRE-Seq|MeDIP-Seq|MBD-Seq|Synthetic-Long-Read|ATAC-seq|ChIA-PET|FAIRE-seq|Hi-C|ncRNA-Seq|RAD-Seq|RIP-Seq|SELEX|ssRNA-seq|Targeted-Capture|Tethered Chromatin Conformation Capture|OTHER) - | ||
layout SRA_ngsQC.tsv required #root/layout layout string - PAIRED ^.*$ - | ||
library_name SRA_ngsQC.tsv optional #root/library_name library_name string - IL100100209 ^.*$ - | ||
selection SRA_ngsQC.tsv optional #root/selection selection string - cDNA (RANDOM|PCR|RANDOM PCR|RT-PCR|HMPR|MF|CF-S|CF-M|CF-H|CF-T|MDA|MSLL|cDNA|ChIP|MNase|DNAse|Hybrid Selection|Reduced Representation|Restriction Digest|5-methylcytidine antibody|MBD2 protein methyl-CpG binding domain|CAGE|RACE|size fractionation|Padlock probes capture method|other|unspecified|cDNA_oligo_dT|cDNA_randomPriming|Inverse rRNA|Oligo-dT|PolyA|repeat fractionation) - | ||
instrument SRA_ngsQC.tsv optional #root/instrument instrument string - Illumina MiSeq ^.*$ - | ||
file_type SRA_ngsQC.tsv required #root/file_type file_type string - fastq ^.*$ - | ||
reads_unaligned SRA_ngsQC.tsv optional #root/reads_unaligned reads_unaligned string - 61059 ^[2-9]|[1-9]\d+$ - | ||
identified_reads SRA_ngsQC.tsv optional #root/identified_reads identified_reads string - 603880 ^[2-9]|[1-9]\d+$ - | ||
taxonomy_id SRA_ngsQC.tsv optional #root/taxonomy_id taxonomy_id string - 64320 ^[2-9]|[1-9]\d+$ - | ||
lineage SRA_ngsQC.tsv optional #root/lineage lineage string - Viruses|Riboviria|Orthornavirae|Kitrinoviricota|Flasuviricetes|Amarillovirales|Flaviviridae|Flavivirus|Zika virus ^.*$ - | ||
percent_identified SRA_ngsQC.tsv optional #root/percent_identified percent_identified string - 0.90817353 ^[+-]?([0]+\.?[0-9]*|\.[0-9]+)$ - | ||
ngs_gc_content SRA_ngsQC.tsv optional #root/gc_content gc_content string - 0.45566968 ^[+-]?([0]+\.?[0-9]*|\.[0-9]+)$ - | ||
genome_assembly_id SRA_assemblyQC.tsv required #root/genome_assembly_id genome_assembly_id string - GCA_013267415.1 ^.*$ - | ||
level SRA_assemblyQC.tsv required #root/level level string - Complete genome ^.*$ - | ||
assembly_level SRA_assemblyQC.tsv required #root/assembly_level assembly_level string - undefined ^.*$ - | ||
num_chromosomes SRA_assemblyQC.tsv required #root/num_chromosomes num_chromosomes string - 1 ^.*$ - | ||
biosample SRA_assemblyQC.tsv required #root/biosample biosample string - SAMN11056500 ^.*$ - | ||
strain SRA_assemblyQC.tsv required #root/strain strain string - FDAARGOS_785 ^.*$ - | ||
organism_name SRA_assemblyQC.tsv required #root/Taxonomy Name Taxonomy Name string - Abiotrophia defectiva ^.*$ - | ||
bioproject SRA_assemblyQC.tsv required #root/BioPproject BioPproject string - PRJNA231221 ^.*$ - | ||
taxonomy_id SRA_assemblyQC.tsv required #root/taxonomy_id taxonomy_id string - 46125 ^[2-9]|[1-9]\d+$ - | ||
lineage SRA_assemblyQC.tsv required #root/lineage lineage string - cellular organisms; Bacteria; Terrabacteria group; Firmicutes; Bacilli; Lactobacillales; Aerococcaceae; Abiotrophia ^.*$ - | ||
uniprotkb_canonical_ac sars-cov-2_lineage_mutations.tsv required #root/uniprotkb_canonical_ac uniprotkb_canonical_ac string - P0DTC2-1 ^.*$ - | ||
found_in sars-cov-2_lineage_mutations.tsv required #root/found_in found_in string - "Omicron (21K, BA.1)" ^.*$ - | ||
gene_name sars-cov-2_lineage_mutations.tsv required #root/gene_name gene_name string - S ^.*$ - | ||
protein_name sars-cov-2_lineage_mutations.tsv required #root/protein_name protein_name string - Spike glycoprotein ^.*$ - | ||
evidence_ECO0000313 sars-cov-2_lineage_mutations.tsv required #root/evidence_ECO0000313 evidence_ECO0000313 string - CoVariants ^.*$ - | ||
begin_aa_pos sars-cov-2_lineage_mutations.tsv required #root/begin_aa_pos begin_aa_pos string - 67 ^.*$ - | ||
end_aa_pos sars-cov-2_lineage_mutations.tsv required #root/end_aa_pos end_aa_pos string - 67 ^.*$ - | ||
mutation_type sars-cov-2_lineage_mutations.tsv required #root/mutation_type mutation_type string - missense ^.*$ - | ||
is_known_glycoprotein sars-cov-2_lineage_mutations.tsv required #root/is_known_glycoprotein is_known_glycoprotein string - yes ^.*$ - | ||
motif_start sars-cov-2_lineage_mutations.tsv required #root/motif_start motif_start string - 67 ^.*$ - | ||
ref_motif sars-cov-2_lineage_mutations.tsv required #root/ref_motif ref_motif string - A ^.*$ - | ||
alt_motif sars-cov-2_lineage_mutations.tsv required #root/alt_motif alt_motif string - V ^.*$ - | ||
effect sars-cov-2_lineage_mutations.tsv required #root/effect effect string - - ^.*$ - | ||
note sars-cov-2_lineage_mutations.tsv required #root/note note string - - ^.*$ - | ||
organism_name ngs_ID_list.tsv required #root/note organism_name string - Severe acute respiratory syndrome coronavirus 3 ^.*$ - | ||
leaf_node ngs_ID_list.tsv required #root/note Leaf Node string - B.1.1.529/Omicron isolate ^.*$ - | ||
genome_assembly_id ngs_ID_list.tsv required #root/genome_assembly_id genome_assembly_id string - GCA_013267415.1 ^.*$ - | ||
taxonomy_id ngs_ID_list.tsv required #root/taxonomy_id taxonomy_id string - 2697049 ^[2-9]|[1-9]\d+$ - | ||
sra_run_id ngs_ID_list.tsv required #root/note SRA Run ID string - SRR17309642 ^\+SRR.+ - | ||
selection_notes ngs_ID_list.tsv required #root/note Selection Notes string - "For omicron, we are selecting EPI_ISL_6913953. Sequencing was conducted on Illumina MiSeq, has high coverage, and a consistent quality score across all base calls above 30. Raw reads are available at https://www.ncbi.nlm.nih.gov/sra/SRX13486794, and a full description of the patient harboring the virus is supplied with the following publication https://academic.oup.com/cid/advance-article/doi/10.1093/cid/ciab1072/6494531?login=true. The patient was one of the first two known COVID-19 cases classified as omicron in Japan. To put the collection date of 28 November 2021 in perspective, the first known omicron sample was collected on 8 November 2021. Raw reads from South Africa are available, but the average phred quality score is much lower for those samples." ^.*$ - | ||
lab_name ngs_ID_list.tsv required #root/note Lab Name string - Pond Lab (Pond Lab|Crandall Lab|HIVE Lab) - | ||
files_processed ngs_ID_list.tsv required #root/processed processed string - ngsQC_HL|assemblyQC_HL|siteQC_HL|biosampleMeta_HL (ngsQC_HL|assemblyQC_HL|siteQC_HL|biosampleMeta_HL) - | ||
biosample SRA_biosample required #root/biosample biosample string - SAMN16357302 ^.*$ - | ||
organism_name SRA_biosample required #root/organism_name organism_name string - Pediococcus acidilactici ^.*$ - | ||
strain SRA_biosample required #root/strain strain string - FDAARGOS_1133 ^.*$ - | ||
sample_name SRA_biosample required #root/sample_name sample_name string - FDAARGOS_1133 ^.*$ - | ||
taxonomy_id SRA_biosample required #root/taxonomy_id taxonomy_id string - 1254 ^.*$ - | ||
isolate SRA_biosample required #root/isolate isolate string - Not applicable ^.*$ - | ||
collected_by SRA_biosample required #root/collected_by collected_by string - DSMZ ^.*$ - | ||
collection_date SRA_biosample required #root/collection_date collection_date string - Unknown ^.*$ - | ||
geo_loc_name SRA_biosample required #root/geo_loc_name geo_loc_name string - Germany: Braunschweig ^.*$ - | ||
isolation_source SRA_biosample required #root/isolation_source isolation_source string - missing ^.*$ - | ||
lat_lon SRA_biosample required #root/lat_lon lat_lon string - Not applicable ^.*$ - | ||
culture_collection SRA_biosample required #root/culture_collection culture_collection string - DSM:20284 ^.*$ - | ||
host SRA_biosample required #root/host host string - missing ^.*$ - | ||
host_age SRA_biosample required #root/host_age host_age string - Unknown ^.*$ - | ||
host_description SRA_biosample required #root/host_description host_description string - Not applicable ^.*$ - | ||
host_disease SRA_biosample required #root/host_disease host_disease string - Unknown ^.*$ - | ||
host_disease_outcome SRA_biosample required #root/host_disease_outcome host_disease_outcome string - Unknown ^.*$ - | ||
host_disease_stage SRA_biosample required #root/host_disease_stage host_disease_stage string - Unknown ^.*$ - | ||
host_health_state SRA_biosample required #root/host_health_state host_health_state string - Unknown ^.*$ - | ||
host_sex SRA_biosample required #root/host_sex host_sex string - Unknown ^.*$ - | ||
id_method SRA_biosample required #root/id_method id_method string - Phenotypic and Molecular Methods ^.*$ - | ||
type_material SRA_biosample required #root/type_material type_material string - neotype strain of Pediococcus acidilactici ^.*$ - | ||
uniprotkb_ac uniprot-proteome_*.csv required #root/uniprotkb_ac uniprotkb_ac string - P03466 ^.*$ - | ||
entry_name uniprot-proteome_*.csv required #root/entry_name entry_name string - RDRP_I34A1 ^.*$ - | ||
status uniprot-proteome_*.csv required #root/status status string - reviewed ^.*$ - | ||
protein_names uniprot-proteome_*.csv required #root/protein_names protein_names string - RNA-directed RNA polymerase catalytic subunit (EC 2.7.7.48) (Polymerase basic protein 1) (PB1) (RNA-directed RNA polymerase subunit P1) ^.*$ - | ||
gene_names uniprot-proteome_*.csv required #root/gene_names gene_names string - PB1 ^.*$ - | ||
organism_name uniprot-proteome_*.csv required #root/organism organism string - Influenza A virus (strain A/Puerto Rico/8/1934 H1N1) ^.*$ - | ||
length uniprot-proteome_*.csv required #root/length length string - 498 ^.*$ - |
Oops, something went wrong.