-
Notifications
You must be signed in to change notification settings - Fork 1
/
project-tips-resources.qmd
175 lines (119 loc) · 7.82 KB
/
project-tips-resources.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
---
title: "Project tips + resources"
---
```{r include = F}
library(tidyverse)
library(knitr)
library(broom)
```
## Data sources
### Some resources that may be helpful as you find data:
- [R Data Sources for Regression Analysis](https://rfun.library.duke.edu/blog/data-sources-for-regression-analysis/)
- [FiveThirtyEight data](https://data.fivethirtyeight.com/)
- [TidyTuesday](https://github.com/rfordatascience/tidytuesday)
### Other data repositories
- [World Health Organization](https://www.who.int/gho/database/en/)
- [The National Bureau of Economic Research](https://data.nber.org/data/)
- [International Monetary Fund](https://data.imf.org/?sk=388DFA60-1D26-4ADE-B505-A05A558D9A42&sId=1479329328660)
- [General Social Survey](http://gss.norc.org/)
- [United Nations Data](http://data.un.org/)
- [United Nations Statistics Division](https://unstats.un.org/home/)
- [U.K. Data](https://data.gov.uk/)
- [U.S. Data](https://www.data.gov/)
- [U.S. Census Data](https://www.census.gov/data.html)
- [European Statistics](https://ec.europa.eu/eurostat/)
- [Statistics Canada](https://www.statcan.gc.ca/eng/start)
- [Pew Research](https://www.pewresearch.org/download-datasets/)
- [UNICEF](https://data.unicef.org/)
- [CDC](https://www.cdc.gov/datastatistics/index.html)
- [World Bank](https://datacatalog.worldbank.org/)
- [Election Studies](https://electionstudies.org//)
## Tips
- Ask questions if any of the expectations are unclear.
- **Code:** In your write up your code should be hidden (`echo = FALSE`) so that your document is neat and easy to read. However your document should include all your code such that if I re-knit your qmd file I should be able to obtain the results you presented.
- **Exception:** If you want to highlight something specific about a piece of code, you're welcome to show that portion.
- Merge conflicts will happen, issues will arise, and that's fine! Commit and push often, and ask questions when stuck.
- Make sure each team member is contributing, both in terms of quality and quantity of contribution (we will be reviewing commits from different team members).
- All team members are expected to contribute equally to the completion of this assignment and group assessments will be given at its completion - anyone judged to not have sufficient contributed to the final product will have their grade penalized. While different teams members may have different backgrounds and abilities, it is the responsibility of every team member to understand how and why all code and approaches in the assignment works.
## Formatting + communication tips
### Suppress Code, Warnings, & Messages
- Include the following code in a code chunk at the top of your .qmd file to suppress all code, warnings, and other messages. Use the code chunk header `{r set-up, include = FALSE}` to suppress this set up code.
``` r
knitr::opts_chunk$set(echo = FALSE,
warning = FALSE,
message = FALSE)
```
### Headers
- Use headers to clearly label each section.
- Inspect the document outline to review your headers and sub-headers.
### References
- Include all references in a section called "References" at the end of the report.
- This course does not have specific requirements for formatting citations and references.
### Appendix
- If you have additional work that does not fit or does not belong in the body of the report, you may put it at the end of the document in section called "Appendix".
- The items in the appendix should be properly labeled.
- The appendix should only be for additional material. The reader should be able to fully understand your report without viewing content in the appendix.
### Resize figures
Resize plots and figures, so you have more space for the narrative.
### Arranging plots
Arrange plots in a grid, instead of one after the other. This is especially useful when displaying plots for exploratory data analysis and to check assumptions.
If you're using ggplot2 functions, the `patchwork` package makes it easy to arrange plots in a grid. See the documentation and examples [here](https://patchwork.data-imaginist.com/).
### Plot titles and axis labels
Be sure all plot titles and axis labels are visible and easy to read.
- Use informative titles, *not* variable names, for titles and axis labels.
❌ **NO! The x-axis is hard to read because the names overlap.**
```{r}
ggplot(data = mpg, aes(x = manufacturer)) +
geom_bar()
```
✅ **YES! Names are readable**
```{r}
ggplot(data = mpg, aes(y = manufacturer)) +
geom_bar()
```
### Do a little more to make the plot look professional!
- Informative title and axis labels
- Flipped coordinates to make names readable
- Arranged bars based on count
- Capitalized manufacturer names
- *Optional: Added color - Use a coordinated color scheme throughout paper / presentation*
- *Optional: Applied a theme - Use same theme throughout paper / presentation*
```{r}
mpg %>%
count(manufacturer) %>%
mutate(manufacturer = str_to_title(manufacturer)) %>%
ggplot(aes(y = fct_reorder(manufacturer,n), x = n)) +
geom_bar(stat = "identity", fill = "steelblue") +
labs(x = "Manufacturer",
y = "Count",
title = "The most common manufacturer is Dodge") +
theme_minimal()
```
### Tables and model output
- Use the `kable` function from the knitr package to neatly output all tables and model output. This will also ensure all model coefficients are displayed.
- Use the `digits` argument to display only 3 or 4 significant digits.
- Use the `caption` argument to add captions to your table.
```{r}
model <- lm(mpg ~ hp, data = mtcars)
tidy(model) %>%
kable(digits = 3)
```
### Guidelines for communicating results
- **Don't use variable names in your narrative!** Use descriptive terms, so the reader understands your narrative without relying on the data dictionary.
- ❌ There is a negative linear relationship between mpg and hp.
- ✅ There is a negative linear relationship between a car's fuel economy (in miles per gallon) and its horsepower.
- **Know your audience:** Your report should be written for a general audience who has an understanding of statistics at the level of STA 210.
- **Avoid subject matter jargon:** Don't assume the audience knows all of the specific terminology related to your subject area. If you must use jargon, include a brief definition the first time you introduce a term.
- **Tell the "so what":** Your report and presentation should be more than a list of interpretations and technical definitions. Focus on what the results mean, i.e. what you want the audience to know about your topic after reading your report or viewing your presentation.
- ❌ For every one unit increase in horsepower, we expect the miles per gallon to decrease by 0.068 units, on average.
- ✅ If the priority is to have good fuel economy, then one should choose a car with lower horsepower. Based on our model, the fuel economy is expected to decrease, on average, by 0.68 miles per gallon for every 10 additional horsepower.
- **Tell a story:** All visualizations, tables, model output, and narrative should tell a cohesive story!
- **Use one voice:** Though multiple people are writing the report, it should read as if it's from a single author. At least one team member should read through the report before submission to ensure it reads like a cohesive document.
## Additional resources
- [R for Data Science](https://r4ds.had.co.nz/)
- [Quarto Documentation](https://quarto.org/)
- Data visualization
- [ggplot2 Reference](https://ggplot2.tidyverse.org/reference/index.html)
- [ggplot2: Elegant Graphics for Data Analysis](https://ggplot2-book.org/)
- [Data Visualization: A Practice Introduction](https://socviz.co/index.html)
- [Patchwork R Package](https://patchwork.data-imaginist.com/index.html)