Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pysradb to fetch SRA metadata? #96

Open
markziemann opened this issue Nov 29, 2021 · 2 comments
Open

Pysradb to fetch SRA metadata? #96

markziemann opened this issue Nov 29, 2021 · 2 comments

Comments

@markziemann
Copy link
Owner

This looks like a more stable alternative https://github.com/saketkc/pysradb

@markziemann
Copy link
Owner Author

library("tictoc")
library(XML)
library(reutils)

tic()
eres <- esearch("Escherichia coli[orgn] and transcriptomic[Source] and public[Access] ", db="sra",retmax=999000)
str(uid(eres))
esum <- esummary(eres)
econtent <- content(esum, "parsed")
runvec <- econtent$Runs
runvec <- gsub("><",">><<",runvec)
runvec <- unlist(strsplit(runvec,"><"))
runs <- lapply( runvec ,function(x) { as.vector(xmlToList(x)) } )
runs <- do.call(rbind,runs)
toc()

@markziemann
Copy link
Owner Author

simpler

pysradb search --organism="Escherichia coli" --source="transcriptomic" --max=999000 > ecoli.tsv
awk '{print $(NF-2)}' ecoli.tsv > ecoli_runs.tsv 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant