-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathworkbook06.Rmd
117 lines (73 loc) · 3.38 KB
/
workbook06.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
---
title: "STAT 33A Workbook 6"
date: "Oct 8, 2020"
author: "YOUR NAME (YOUR SID)"
output: pdf_document
---
This workbook is due __Oct 8, 2020__ by 11:59pm PT.
The workbook is organized into sections that correspond to the lecture videos
for the week. Watch a video, then do the corresponding exercises _before_
moving on to the next video.
Workbooks are graded for completeness, so as long as you make a clear effort to
solve each problem, you'll get full credit. That said, make sure you understand
the concepts here, because they're likely to reappear in homeworks, quizzes,
and later lectures.
As you work, write your answers in this notebook. Answer questions with
complete sentences, and put code in code chunks. You can make as many new code
chunks as you like.
In the notebook, you can run the line of code where the cursor is by pressing
`Ctrl` + `Enter` on Windows or `Cmd` + `Enter` on Mac OS X. You can run an
entire code chunk by clicking on the green arrow in the upper right corner of
the code chunk.
Please do not delete the exercises already in this notebook, because it may
interfere with our grading tools.
You need to submit your work in two places:
* Submit this Rmd file with your edits on bCourses.
* Knit and submit the generated PDF file on Gradescope.
If you have any last-minute trouble knitting, **DON'T PANIC**. Submit your Rmd
file on time and follow up in office hours or on Piazza to sort out the PDF.
Exploratory Data Analysis
=========================
Watch the "Exploratory Data Analysis" lecture video.
No exercises for this section. Almost done!
Distribution Plots
==================
Watch the "Distribution Plots" lecture video.
## Exercise 1
1. Use the Dogs data set and the ggplot2 package to create a density plot. The
plot should show the distribution of weights grouped by the grooming needs
of the breed.
2. Use the ggridges package to create a ridges plot that shows the same
information as the plot from part 1.
2. In 2-5 sentences, comment on if and how weights relate to grooming needs.
**YOUR ANSWER GOES HERE:**
Faceted Plots
=============
Watch the "Faceted Plots" lecture video.
## Exercise 2
1. Use the Datasaurus Dozen data set to create a faceted scatter plot.
Show each dataset from the Datasaurus Dozen in a separate facet. Use
`geom_smooth()` with `method = "lm"` to add a linear regression line to
each facet.
2. Is there any pattern to the regression lines across the different data sets?
**YOUR ANSWER GOES HERE:**
EDA Strategy
============
Watch the "EDA Strategy" lecture video.
No exercises for this section. Almost done!
EDA Examples
============
Watch the "EDA Examples" lecture video.
## Exercise 3
1. Come up with 2 questions it might be possible to answer using the Craigslist
Apartments data. Think about what the columns in the data set are actually
capable of answering. Avoid simple questions that are likely to yield a yes
or no answer with minimal investigation--these usually aren't interesting.
_Hint: Briefly inspect the data set to see what's there before coming up
with questions._
2. Make a plot or compute statistics to help answer your first question.
Discuss your result in 1-3 sentences. It's okay if you don't haven't
completely "solved" the question as long as you're able to meaningfully
address what it's asking.
3. Repeat part 2 for you second question.
**YOUR ANSWER GOES HERE:**