Skip to content

Commit

Permalink
updated expression docker and FSES headings
Browse files Browse the repository at this point in the history
FSES data said 'environment' not 'environmental',. enrichR package was removed from CRAN so expression data was not being built
  • Loading branch information
sgosline committed Nov 7, 2024
1 parent 3df3551 commit bcdb2f9
Show file tree
Hide file tree
Showing 5 changed files with 38 additions and 15 deletions.
21 changes: 21 additions & 0 deletions data/README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,30 @@
## Data directory
This directory contains existing data and references files to be added to the SRP analytics database. These data are not private in any way, but mainly serve as reference files for the broader repository.

### Data Schema file
Incoming data requires a pre-defined schema so we can validate things
BEFORE they are going to be processed by our pipeline. This schema
also enables an easy way to check if a file CAN be processed before
adding it to the list.

### Data file manifest
The data file manifest contains a link to all the other data files,
including those referenced in this document. This file can be found
[here](./srp_build_files.csv).

### Chemical reference information

Many of the build files are actual reference information.

#### Chemical id file
#### Chemical classification file
#### compTox file

### Environmental sample information

#### Sample id file

### Existing Zebrafish data


####
2 changes: 1 addition & 1 deletion data/fses/FSES_indoor_outdoor_study.csv
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"","SampleNumber","date_sampled","date_sample_start","sample_matrix","technology","Sample_ID","zf_lims_id","cas_number","ClientName","SampleName","LocationLat","LocationLon","LocationName","LocationAlternateDescription","AlternateName","Chemical_ID","measurement_value","measurement_value_qualifier","measurement_value_unit","measurement_value_molar","measurement_value_molar_unit","environment_concentration","environment_concentration_qualifier","environment_concentration_unit","environment_concentration_molar","environment_concentration_molar_unit","parentSampleNumber","childSampleNumber","projectName","projectLink"
"","SampleNumber","date_sampled","date_sample_start","sample_matrix","technology","Sample_ID","zf_lims_id","cas_number","ClientName","SampleName","LocationLat","LocationLon","LocationName","LocationAlternateDescription","AlternateName","Chemical_ID","measurement_value","measurement_value_qualifier","measurement_value_unit","measurement_value_molar","measurement_value_molar_unit","environmental_concentration","environmental_concentration_qualifier","environmental_concentration_unit","environmental_concentration_molar","environmental_concentration_molar_unit","parentSampleNumber","childSampleNumber","projectName","projectLink"
"1","A181010","2018-09-30 11:43:00","2018-09-05 00:00:00","PSD-Air","GC-MS - RTL DRS Screening - MASV15",NA,NA,"120-82-1","FSES Laboratory","La Pine, OR Outdoor",43.6704,121.5036,"LaPine, OR",NA,NA,NA,"<50 J","U","pg/uL","<276 J","pmol/mL",NA,NA,NA,NA,NA,NA,NA,"Forest Fire Citizen Science 2018 (Pilot)","http://fses.oregonstate.edu/research/indoor-outdoor-air-quality"
"2","A181010","2018-09-30 11:43:00","2018-09-05 00:00:00","PSD-Air","GC-MS - RTL DRS Screening - MASV15",NA,NA,"96-12-8","FSES Laboratory","La Pine, OR Outdoor",43.6704,121.5036,"LaPine, OR",NA,NA,NA,"<100 J","U","pg/uL","<423 J","pmol/mL",NA,NA,NA,NA,NA,NA,NA,"Forest Fire Citizen Science 2018 (Pilot)","http://fses.oregonstate.edu/research/indoor-outdoor-air-quality"
"3","A181010","2018-09-30 11:43:00","2018-09-05 00:00:00","PSD-Air","GC-MS - RTL DRS Screening - MASV15",NA,NA,"95-50-1","FSES Laboratory","La Pine, OR Outdoor",43.6704,121.5036,"LaPine, OR",NA,NA,NA,"<50 J","U","pg/uL","<340 J","pmol/mL",NA,NA,NA,NA,NA,NA,NA,"Forest Fire Citizen Science 2018 (Pilot)","http://fses.oregonstate.edu/research/indoor-outdoor-air-quality"
Expand Down
2 changes: 1 addition & 1 deletion data/fses/fses_data_for_pnnl_4-27-2021.csv
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
SampleNumber,date_sampled,sample_matrix,technology,Sample_ID,zf_lims_id,cas_number,ClientName,SampleName,LocationLat,LocationLon,LocationName,LocationAlternateDescription,AlternateName,Chemical_ID,measurement_value,measurement_value_qualifier,measurement_value_unit,measurement_value_molar,measurement_value_molar_unit,environment_concentration,environment_concentration_qualifier,environment_concentration_unit,environment_concentration_molar,environment_concentration_molar_unit,parentSampleNumber,childSampleNumber,projectName,date_sample_start,projectLink
enSampleNumber,date_sampled,sample_matrix,technology,Sample_ID,zf_lims_id,cas_number,ClientName,SampleName,LocationLat,LocationLon,LocationName,LocationAlternateDescription,AlternateName,Chemical_ID,measurement_value,measurement_value_qualifier,measurement_value_unit,measurement_value_molar,measurement_value_molar_unit,environmental_concentration,environmental_concentration_qualifier,environmental_concentration_unit,environmental_concentration_molar,environmental_concentration_molar_unit,parentSampleNumber,childSampleNumber,projectName,date_sample_start,projectLink
10JUL11-01-007,2010-07-07 8:00:00,PSD-Water,LFT - Spiked,NULL,10JUL11-01-015,120-12-7,NULL,LA-LFT-W-SPK,29.2611,-89.2611,"Grand Isle, Louisiana",NULL,"Grand Isle, LA",0,0,NULL,pg/uL,0,pmol/mL,BDL,NULL,ng/L,nc:BDL,nmol/L,NULL,A100078,NA,NA,NA
10JUL11-01-007,2010-07-07 8:00:00,PSD-Water,LFT - Spiked,NULL,10JUL11-01-015,129-00-0,NULL,LA-LFT-W-SPK,29.2611,-89.2611,"Grand Isle, Louisiana",NULL,"Grand Isle, LA",0,32200,NULL,pg/uL,159208,pmol/mL,14.8,NULL,ng/L,0.0731765,nmol/L,NULL,A100078,NA,NA,NA
10JUL11-01-007,2010-07-07 8:00:00,PSD-Water,LFT - Spiked,NULL,10JUL11-01-015,132-65-0,NULL,LA-LFT-W-SPK,29.2611,-89.2611,"Grand Isle, Louisiana",NULL,"Grand Isle, LA",0,309,NULL,pg/uL,1677.01,pmol/mL,0.462,NULL,ng/L,0.00250737,nmol/L,NULL,A100078,NA,NA,NA
Expand Down
22 changes: 11 additions & 11 deletions sampleChemMapping/mapSamplesToChems.R
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ require(rio)
require(argparse)
require(xml2)
library(tidyr)
##The data release will be comprised of 9 files (note change from v1!)
##The data release will be comprised of 9 files (note change fro v1!)
#' 1- list of environmental samples and the chemical composition (curated sample data)
#' 2- ZF summary statistics for chemicals (and no chemical metadata)
#' 3- ZF points to plot for chemicals
Expand All @@ -27,13 +27,13 @@ required_sample_columns<-c("ClientName","SampleNumber","date_sampled","sample_ma
"AlternateName","cas_number","date_sample_start",
"measurement_value","measurement_value_qualifier","measurement_value_unit",
"measurement_value_molar","measurement_value_molar_unit",
'environment_concentration','environment_concentration_qualifier','environment_concentration_unit',
'environment_concentration_molar','environment_concentration_molar_unit')
'environmental_concentration','environmental_concentration_qualifier','environmental_concentration_unit',
'environmental_concentration_molar','environmental_concentration_molar_unit')

#we need to rename the water columns
#new_sample_columns=c(environment_concentration="water_concentration",environment_concentration_qualifier='water_concentration_qualifier',
# environment_concentration_unit='water_concentration_unit',environment_concentration_molar='water_concentration_molar',
# environment_concentration_molar_unit='water_concentration_molar_unit')
#new_sample_columns=c(environmental_concentration="water_concentration",environmental_concentration_qualifier='water_concentration_qualifier',
# environmental_concentration_unit='water_concentration_unit',environmental_concentration_molar='water_concentration_molar',
# environmental_concentration_molar_unit='water_concentration_molar_unit')

##required for comptox-derived mapping files
required_comptox_columns <- c("INPUT","DTXSID","PREFERRED_NAME","INCHIKEY","SMILES","MOLECULAR_FORMULA",
Expand All @@ -43,8 +43,8 @@ required_comptox_columns <- c("INPUT","DTXSID","PREFERRED_NAME","INCHIKEY","SMIL
##output tables
sample_chem_columns <-c('Sample_ID','Chemical_ID',"measurement_value","measurement_value_qualifier","measurement_value_unit",
"measurement_value_molar","measurement_value_molar_unit",
"environment_concentration","environment_concentration_qualifier","environment_concentration_unit",
"environment_concentration_molar","environment_concentration_molar_unit")
"environmental_concentration","environmental_concentration_qualifier","environmental_concentration_unit",
"environmental_concentration_molar","environmental_concentration_molar_unit")

samp_columns <-c("Sample_ID","ClientName","SampleNumber","date_sampled","sample_matrix","technology",
"projectName","SampleName","LocationLat","projectLink",
Expand Down Expand Up @@ -325,10 +325,10 @@ buildSampleData<-function(fses_files, #files from barton that contain sample inf
# dplyr::rename(new_sample_columns)|> ##REMOVE this once we have new names
subset(SampleNumber!='None')%>%
subset(cas_number!='NULL')%>%
mutate(environment_concentration_molar=stringr::str_replace_all(environment_concentration_molar,'BLOD|NULL|nc:BDL',"0"))%>%
mutate(environmental_concentration_molar=stringr::str_replace_all(environmental_concentration_molar,'BLOD|NULL|nc:BDL',"0"))%>%
mutate(measurement_value_molar=stringr::str_replace_all(measurement_value_molar,'BLOD|NULL|BDL',"0"))%>%
mutate(environment_concentration=stringr::str_replace_all(environment_concentration,'BLOD|NULL|BDL',"0"))%>%
# subset(environment_concentration_molar!='0.0')%>%
mutate(environmental_concentration=stringr::str_replace_all(environmental_concentration,'BLOD|NULL|BDL',"0"))%>%
# subset(environmental_concentration_molar!='0.0')%>%
subset(!measurement_value_molar%in%c('0'))%>%
subset(!measurement_value%in%c("0","NULL",""))#%>%
# select(-c(Sample_ID))#,Chemical_ID)) ##These two are added in the 4/27 version of the file
Expand Down
6 changes: 4 additions & 2 deletions zfExp/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ FROM r-base:4.4.1

RUN apt-get update --allow-insecure-repositories
RUN apt-get install -y --allow-unauthenticated --fix-missing python3-pip python3-setuptools python3-dev python3-venv libcurl4-openssl-dev libglpk-dev libxml2-dev libpq-dev

RUN apt-get install -y libmariadb-dev-compat libmariadb-dev

RUN python3 -m venv /opt/venv

Expand All @@ -26,7 +26,9 @@ RUN Rscript -e "install.packages('readxl',dependencies=TRUE, repos='http://cran.
RUN Rscript -e "install.packages('dplyr',dependencies=TRUE, repos='http://cran.rstudio.com')"
RUN Rscript -e "install.packages('tidyr',dependencies=TRUE, repos='http://cran.rstudio.com')"
RUN Rscript -e "install.packages('rio',dependencies=TRUE, repos='http://cran.rstudio.com')"
RUN Rscript -e "install.packages('enrichR', dependencies=TRUE, repos='http://cran.rstudio.com')"
RUN Rscript -e "install.packages('rjson',dependencies=TRUE, repos='http://cran.rstudio.com')"
RUN Rscript -e "install.packages('WriteXLS',dependencies=TRUE, repos='http://cran.rstudio.com')"
RUN Rscript -e "install.packages('https://cran.r-project.org/src/contrib/Archive/enrichR/enrichR_3.2.tar.gz', dependencies=TRUE, source=TRUE)"
RUN Rscript -e "install.packages('argparse',dependencies=TRUE,repos='http://cran.rstudio.com')"

RUN pip3 install --upgrade pip
Expand Down

0 comments on commit bcdb2f9

Please sign in to comment.