neon-data-tools

fan's version to get available neon raw sequences

browse or search for needed sequences here https://data.neonscience.org/static/browse.html (eg., 16S) and get the product ID.

run R script with a product ID as the input to get the CSV tables containing all raw sequence urls. ID could be looped.

Rscript /PATH/TO/FAN'S/get_neon_csv_w_rawseq_urls.R "DP1.20086.001" 
# remove empty files:  
find . -name "*.sh" -size -40k -delete

create a new dir to keep all csv tables with urls downloaded from step 2.
```
mkdir neon_csv && cd neon_csv
bash ../*_curl_command.sh
```

combine all downloaded csv files to get urls into a list and generate another curl script.

cd neon_csv
cat *.csv | sed '/uid/d' | rev | cut -d "," -f 4 | rev | cut -d '"' -f 2 > ../all_fq.url
while read line; do echo "curl -O $line"; done < ../all_fq.url > ../get_all_fq.sh

Download

cd neon_csv 
mkdir ../neon_fqs && cd ../neon_fqs
bash ../get_all_fq.sh`

untar into new dir names and get rid of all unneeded dir levels

cd neon_fqs
for i in *.tar.gz; do mkdir ${i//.fastq.tar.gz//}; tar -zxvf $i -C ${i//.fastq.tar.gz//}; done
# remove gz files
rm *.gz
for i in */; do cd /PATH/TO/$i; find . -type f -name "*.fastq" | tr '\n' '\0' | xargs -0 -I {} mv {} .; done
for i in */; do rm -r $i/hpc; done

the fastq names can be linked to DNA sample ID, dates, and sites using the CSV table from step 3.
- neon field site to state information here

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
LICENSE		LICENSE
README.md		README.md
downloadRawSequenceData.R		downloadRawSequenceData.R
neon_field-sites.csv		neon_field-sites.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

neon-data-tools

fan's version to get available neon raw sequences

About

Releases

Packages

Contributors 2

Languages

License

germs-lab/neon-data-tools

Folders and files

Latest commit

History

Repository files navigation

neon-data-tools

fan's version to get available neon raw sequences

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages