Skip to content

Commit

Permalink
Trim (#35)
Browse files Browse the repository at this point in the history
* file out

* cleaning
  • Loading branch information
raysinensis authored Apr 14, 2021
1 parent 75df90c commit 11c6028
Show file tree
Hide file tree
Showing 8 changed files with 102 additions and 39,886 deletions.
16 changes: 16 additions & 0 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,22 @@ sets_10x_withmeta <- sets_10x %>%
str_detect(str_to_lower(samplemetainfo_1), "meta|annot|type|clustering|coldata") |
str_detect(str_to_lower(additionalfiles), "meta|annot|type|clustering|coldata")
)
sets_final <- sets_10x %>%
mutate(has_meta = ifelse(
str_detect(str_to_lower(samplemetainfo_1), "meta|annot|type|clustering|coldata") |
str_detect(str_to_lower(additionalfiles), "meta|annot|type|clustering|coldata"),
"yes",
"no")) %>%
mutate(has_meta = ifelse(
is.na(has_meta), "no", has_meta
)) %>%
select(ID, ReleaseDate, PubmedID, Species,
files = samplemetainfo_1, type = samplemetainfo_2,
has_meta) %>%
filter(str_sub(ReleaseDate, 1, 4) != "2021")
# write_tsv(sets_final, "arrayexpres_analysis_2020.tsv.gz")
```

Current fraction in GEO with metadata: **`r (gds$usable == "yes") %>% mean(na.rm = TRUE)`**. In comparison, for ArrayExpress 10x datasets, the fraction is **`r nrow(sets_10x_withmeta) / nrow(sets_10x)`**).
Expand Down
37,397 changes: 0 additions & 37,397 deletions inst/extdata/101220/gds_result_101220.txt

This file was deleted.

Binary file removed inst/extdata/101220/geo_101220.rds
Binary file not shown.
2,322 changes: 0 additions & 2,322 deletions inst/extdata/101220/get_geo.html

This file was deleted.

Binary file removed inst/extdata/previews.rds
Binary file not shown.
37 changes: 37 additions & 0 deletions inst/manuscript/data_readme.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
Numeric data underlying Figure 2 and analyses of Puntambekar et al. 2021

fig2a.tsv:
id : GEO accession
year : year of publication
usable : single cell dataset has metadata
usable2 : format the metadata stored in

fig2b.tsv:
id : GEO accession
journal : published journal
journal_group : subgrouping of journal

fig2c.tsv:
id : GEO accession
usable : single cell dataset has metadata
journal : published journal
cite : number of citations
year : year of publication
ifs : journal impact factor

fig2d.tsv:
id : GEO accession
usable : single cell dataset has metadata
usable2 : format the metadata stored in
software_author : whether an author is involved with scRNA-seq software development

geo_vs_Svensson.tsv:
id : GEO accession
useable : single cell dataset has metadata. Entries in Svensson but not found in automated search are labeled NA

manual_check.tsv:
id : GEO accession
is_sc : manual check if entry is truly single cell data
called_usable : orginal automated call if single cell dataset has metadata
correct_call : manual check if automated call was correct
has_type : manual check if dataset metadata actually contains cell type column/info
7 changes: 0 additions & 7 deletions utils/geo_query.R
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,6 @@ get_date <- function(res) {
}

get_pubmed <- function(res) {
# res <- res[[1]]
if (class(res) != "list") {
return(NA)
}
Expand All @@ -116,12 +115,6 @@ get_pubmed <- function(res) {
return(NA)
}
return(id2)
# tryCatch(
# get_pubmed_ids(id2) %>%
# fetch_pubmed_data() %>%
# article_to_df(),
# error = function(e) {"error"}
# )
}

from_list <- function(pubmedid) {
Expand Down
Loading

0 comments on commit 11c6028

Please sign in to comment.