-
browse or search for needed sequences here
https://data.neonscience.org/static/browse.html
(eg., 16S) and get the product ID. -
run R script with a product ID as the input to get the CSV tables containing all raw sequence urls. ID could be looped.
Rscript /PATH/TO/FAN'S/get_neon_csv_w_rawseq_urls.R "DP1.20086.001" # remove empty files: find . -name "*.sh" -size -40k -delete
-
create a new dir to keep all csv tables with urls downloaded from step 2.
mkdir neon_csv && cd neon_csv bash ../*_curl_command.sh
-
combine all downloaded csv files to get urls into a list and generate another curl script.
cd neon_csv cat *.csv | sed '/uid/d' | rev | cut -d "," -f 4 | rev | cut -d '"' -f 2 > ../all_fq.url while read line; do echo "curl -O $line"; done < ../all_fq.url > ../get_all_fq.sh
-
Download
cd neon_csv mkdir ../neon_fqs && cd ../neon_fqs bash ../get_all_fq.sh`
-
untar into new dir names and get rid of all unneeded dir levels
cd neon_fqs for i in *.tar.gz; do mkdir ${i//.fastq.tar.gz//}; tar -zxvf $i -C ${i//.fastq.tar.gz//}; done # remove gz files rm *.gz for i in */; do cd /PATH/TO/$i; find . -type f -name "*.fastq" | tr '\n' '\0' | xargs -0 -I {} mv {} .; done for i in */; do rm -r $i/hpc; done
-
the fastq names can be linked to DNA sample ID, dates, and sites using the CSV table from step 3.
- neon field site to state information here
-
Notifications
You must be signed in to change notification settings - Fork 0
License
germs-lab/neon-data-tools
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published