forked from everyxs/openScience
-
Notifications
You must be signed in to change notification settings - Fork 2
Rerun the workflow with Snakemake
XiaoranYan edited this page Sep 27, 2019
·
20 revisions
- Input files:
- "input/Open-old.csv", "input/Reproduce-old.csv": The raw output tsv for the papers with the "open science" and "reproducibility" fields of study tag from MAG. Generated directly using a U-sql query from a 02-03-2019 snapshot of MAG on Azure.
- "input/OpenSci3Journal.csv": The journal matched csv for all "open science" and "reproducibility" papers after data transformation and gender detection. Compared with the inter-mediate output "OpenSci3.csv", it has additional columns that maps papers to standardized Web of Science journal names as well as WoS ids.
- "input/Lancet Dictionaries.csv": Custom dictionaries for sentiment analysis (Jorge please add explanations here).
- Output files:
- "figures/openScience.pdf", "figures/reproducibility.pdf": Network visualizations using Gephi. The networks are built after data transformation and cleaning (Figure 1).
- "output/OpenSci3.csv", "output/OpenSci3Discipline.csv": Inter-mediate output data tables used in the paper for analysis. OpenSci3 is the result after data transformation and cleaning. OpenSci3Discipline includes additional columns that maps papers/journals to corresponding disciplines according to the UCSD map of science. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0039464
- "figures/SinglePie.pdf", "figures/MultiPie.pdf": Pie charts "Gender representation in high-status author positions" (Figure 2).
- "figures/TeamSizeHist_fem.pdf", "figures/PredictedProbs_multiauthor_spline2.pdf": Team size and women’s representation in high-status positions (Figure 3), Estimated regression effects of team size and year of publication on women’s representation in high-status positions (Figure 4).
- "figures/ProSocialHist.pdf": Distribution of communal and pro-social word density of abstracts in the Open Science and Reproducibility literatures (Figure 5).
Once you are inside the Binder environment, start a new terminal. Then run the following commands:
cd code-data
snakemake -F
snakemake --forceall --dag | dot -Tpng > workflow.png
You can delete output files like "OpenSci3.csv" and reproduce them, including the following DAG "workflow.png".