GitHub - bifidotftw/splashRNA_webscraper: Downloads result csv file from http://splashrna.mskcc.org/ for a list of entrezIDs

Folder structure:
+-- data
| +-- processed
| +-- raw
+-- script\

Convert HGNC symbols to entrezID and save to data/genes.xlsx
https://www.ensembl.org/biomart/martview
1. Ensembl Genes 100
2. Human genes (GRCh38.p13)
3. Filter/Gene/Input external references ID list: HGNC symbol(s) [e.b.A1BG]
4. Attributes/External References: HGNC symbol, NCBI gene (formerly Entrezgene) ID
5. Download results as "genes.xlsx"
Rename columns to "entrezID" and "HGNC"
Check for missing entries in "entrezID" and replace manually
webscraper.py: Downloads csv file from splashRNA
Copy files from "data/raw" to "data/processed"
truncate3lines: bash script to remove first 3 lines from all csv files in "data/processed"
datamerger.py: Compile individual csv files into single file. Creates "data/97mer.xlsx"

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
README.md		README.md
datamerger.py		datamerger.py
truncate3lines		truncate3lines
webscraper.py		webscraper.py

Provide feedback