Skip to content

Commit 8a0743e

Browse files
committed
Deploying to gh-pages from @ f92cf5d 🚀
1 parent 2cee126 commit 8a0743e

6 files changed

+203
-210
lines changed

v2/ModernDive.pdf

3.09 KB
Binary file not shown.

v2/ModernDive.tex

+137-144
Large diffs are not rendered by default.

v2/appendixC.html

+36-36
Original file line numberDiff line numberDiff line change
@@ -329,18 +329,18 @@ <h4>
329329
<span> <span class="fu"><a href="https://dplyr.tidyverse.org/reference/mutate.html">mutate</a></span><span class="op">(</span>type_new <span class="op">=</span> <span class="fu"><a href="https://dplyr.tidyverse.org/reference/if_else.html">if_else</a></span><span class="op">(</span><span class="va">type</span> <span class="op">==</span> <span class="st">"rom comedy"</span>, <span class="st">"romantic comedy"</span>, <span class="va">type</span><span class="op">)</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span></span>
330330
<span> <span class="fu"><a href="https://dplyr.tidyverse.org/reference/slice.html">slice</a></span><span class="op">(</span><span class="fl">1</span><span class="op">:</span><span class="fl">10</span><span class="op">)</span></span></code></pre></div>
331331
<pre><code># A tibble: 10 × 7
332-
name score rating type millions revenue type_new
333-
&lt;chr&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt;
334-
1 2 Fast 2 Furious 48.9000 PG-13 action NA NA action
335-
2 A Guy Thing 39.5 PG-13 rom comedy 15.545 15545000 romantic com…
336-
3 A Man Apart 42.9000 R action 26.2480 26247999 action
337-
4 A Mighty Wind 79.9000 PG-13 comedy 17.781 17781000 comedy
338-
5 Agent Cody Banks 57.9000 PG action 47.8110 47811001 action
339-
6 Alex &amp; Emma 35.1000 PG-13 rom comedy 14.219 14219000 romantic com…
340-
7 American Wedding 50.7000 R comedy 104.441 104441000 comedy
341-
8 Anger Management 62.6000 PG-13 comedy 134.404 134404010 comedy
342-
9 Anything Else 63.3000 R rom comedy 3.21200 3212000. romantic com…
343-
10 Bad Boys II 38.1000 R action 138.397 138397000 action </code></pre>
332+
name score rating type millions revenue type_new
333+
&lt;chr&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt;
334+
1 2 Fast 2 Furious 48.9000 PG-13 action NA NA action
335+
2 A Guy Thing 39.5 PG-13 rom comedy 15.545 15545000 romantic comedy
336+
3 A Man Apart 42.9000 R action 26.2480 26247999 action
337+
4 A Mighty Wind 79.9000 PG-13 comedy 17.781 17781000 comedy
338+
5 Agent Cody Banks 57.9000 PG action 47.8110 47811001 action
339+
6 Alex &amp; Emma 35.1000 PG-13 rom comedy 14.219 14219000 romantic comedy
340+
7 American Wedding 50.7000 R comedy 104.441 104441000 comedy
341+
8 Anger Management 62.6000 PG-13 comedy 134.404 134404010 comedy
342+
9 Anything Else 63.3000 R rom comedy 3.21200 3212000. romantic comedy
343+
10 Bad Boys II 38.1000 R action 138.397 138397000 action </code></pre>
344344
<p>Do the same here, but return <code>"not romantic comedy"</code> if <code>type</code> is not <code>"rom comedy"</code> and this time overwrite the original <code>type</code> variable:</p>
345345
<div class="sourceCode" id="cb707"><pre class="downlit sourceCode r">
346346
<code class="sourceCode R"><span><span class="va">movies_ex</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span></span>
@@ -377,18 +377,18 @@ <h4>
377377
<span> <span class="op">)</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span></span>
378378
<span> <span class="fu"><a href="https://dplyr.tidyverse.org/reference/slice.html">slice</a></span><span class="op">(</span><span class="fl">1</span><span class="op">:</span><span class="fl">10</span><span class="op">)</span></span></code></pre></div>
379379
<pre><code># A tibble: 10 × 7
380-
name score rating type millions revenue type_new
381-
&lt;chr&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt;
382-
1 2 Fast 2 Furious 48.9000 PG-13 action NA NA Action
383-
2 A Guy Thing 39.5 PG-13 rom comedy 15.545 15545000 Romantic Com…
384-
3 A Man Apart 42.9000 R action 26.2480 26247999 Action
385-
4 A Mighty Wind 79.9000 PG-13 comedy 17.781 17781000 Comedy
386-
5 Agent Cody Banks 57.9000 PG action 47.8110 47811001 Action
387-
6 Alex &amp; Emma 35.1000 PG-13 rom comedy 14.219 14219000 Romantic Com…
388-
7 American Wedding 50.7000 R comedy 104.441 104441000 Comedy
389-
8 Anger Management 62.6000 PG-13 comedy 134.404 134404010 Comedy
390-
9 Anything Else 63.3000 R rom comedy 3.21200 3212000. Romantic Com…
391-
10 Bad Boys II 38.1000 R action 138.397 138397000 Action </code></pre>
380+
name score rating type millions revenue type_new
381+
&lt;chr&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt;
382+
1 2 Fast 2 Furious 48.9000 PG-13 action NA NA Action
383+
2 A Guy Thing 39.5 PG-13 rom comedy 15.545 15545000 Romantic Comedy
384+
3 A Man Apart 42.9000 R action 26.2480 26247999 Action
385+
4 A Mighty Wind 79.9000 PG-13 comedy 17.781 17781000 Comedy
386+
5 Agent Cody Banks 57.9000 PG action 47.8110 47811001 Action
387+
6 Alex &amp; Emma 35.1000 PG-13 rom comedy 14.219 14219000 Romantic Comedy
388+
7 American Wedding 50.7000 R comedy 104.441 104441000 Comedy
389+
8 Anger Management 62.6000 PG-13 comedy 134.404 134404010 Comedy
390+
9 Anything Else 63.3000 R rom comedy 3.21200 3212000. Romantic Comedy
391+
10 Bad Boys II 38.1000 R action 138.397 138397000 Action </code></pre>
392392
</div>
393393
<div id="case_when" class="section level4 unnumbered">
394394
<h4>
@@ -407,18 +407,18 @@ <h4>
407407
<span> <span class="op">)</span></span>
408408
<span> <span class="op">)</span></span></code></pre></div>
409409
<pre><code># A tibble: 108 × 7
410-
name score rating type millions revenue type_new
411-
&lt;chr&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt;
412-
1 2 Fast 2 Furious 48.9000 PG-13 action NA NA Rest
413-
2 A Guy Thing 39.5 PG-13 rom comedy 15.545 15545000 Small budget
414-
3 A Man Apart 42.9000 R action 26.2480 26247999 Rest
415-
4 A Mighty Wind 79.9000 PG-13 comedy 17.781 17781000 Rest
416-
5 Agent Cody Banks 57.9000 PG action 47.8110 47811001 Big budget a…
417-
6 Alex &amp; Emma 35.1000 PG-13 rom comedy 14.219 14219000 Small budget
418-
7 American Wedding 50.7000 R comedy 104.441 104441000 Rest
419-
8 Anger Management 62.6000 PG-13 comedy 134.404 134404010 Rest
420-
9 Anything Else 63.3000 R rom comedy 3.21200 3212000. Small budget
421-
10 Bad Boys II 38.1000 R action 138.397 138397000 Big budget a…
410+
name score rating type millions revenue type_new
411+
&lt;chr&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt;
412+
1 2 Fast 2 Furious 48.9000 PG-13 action NA NA Rest
413+
2 A Guy Thing 39.5 PG-13 rom comedy 15.545 15545000 Small budget romcom
414+
3 A Man Apart 42.9000 R action 26.2480 26247999 Rest
415+
4 A Mighty Wind 79.9000 PG-13 comedy 17.781 17781000 Rest
416+
5 Agent Cody Banks 57.9000 PG action 47.8110 47811001 Big budget action
417+
6 Alex &amp; Emma 35.1000 PG-13 rom comedy 14.219 14219000 Small budget romcom
418+
7 American Wedding 50.7000 R comedy 104.441 104441000 Rest
419+
8 Anger Management 62.6000 PG-13 comedy 134.404 134404010 Rest
420+
9 Anything Else 63.3000 R rom comedy 3.21200 3212000. Small budget romcom
421+
10 Bad Boys II 38.1000 R action 138.397 138397000 Big budget action
422422
# ℹ 98 more rows</code></pre>
423423
</div>
424424
</div>

v2/scripts/10-inference-for-regression.R

+2-2
Original file line numberDiff line numberDiff line change
@@ -622,7 +622,7 @@ mod_mult_table |>
622622

623623
## ----echo=FALSE---------------------------------------------------------------
624624
# Fix the width for the explanatory variable output
625-
options(width = 125)
625+
#options(width = 125)
626626
if (!file.exists("rds/generated_distn_slopes.rds")) {
627627
set.seed(76)
628628
generated_distn_slopes <- coffee_data |>
@@ -639,7 +639,7 @@ if (!file.exists("rds/generated_distn_slopes.rds")) {
639639
}
640640
generated_distn_slopes
641641
# Reset width
642-
options(width = 80)
642+
#options(width = 80)
643643

644644

645645
## ----eval=FALSE---------------------------------------------------------------

v2/search.json

+1-1
Large diffs are not rendered by default.

v2/thinking-with-data.html

+27-27
Original file line numberDiff line numberDiff line change
@@ -205,27 +205,27 @@ <h3>
205205
<span><span class="fu"><a href="https://pillar.r-lib.org/reference/glimpse.html">glimpse</a></span><span class="op">(</span><span class="va">house_prices</span><span class="op">)</span></span></code></pre></div>
206206
<pre><code>Rows: 21,613
207207
Columns: 21
208-
$ id &lt;chr&gt; "7129300520", "6414100192", "5631500400", "2487200875", …
209-
$ date &lt;date&gt; 2014-10-13, 2014-12-09, 2015-02-25, 2014-12-09, 2015-02…
210-
$ price &lt;dbl&gt; 221900, 538000, 180000, 604000, 510000, 1225000, 257500,…
211-
$ bedrooms &lt;int&gt; 3, 3, 2, 4, 3, 4, 3, 3, 3, 3, 3, 2, 3, 3, 5, 4, 3, 4, 2,…
212-
$ bathrooms &lt;dbl&gt; 1.00, 2.25, 1.00, 3.00, 2.00, 4.50, 2.25, 1.50, 1.00, 2.…
213-
$ sqft_living &lt;int&gt; 1180, 2570, 770, 1960, 1680, 5420, 1715, 1060, 1780, 189
214-
$ sqft_lot &lt;int&gt; 5650, 7242, 10000, 5000, 8080, 101930, 6819, 9711, 7470,…
215-
$ floors &lt;dbl&gt; 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 2.0, 1.0, 1…
216-
$ waterfront &lt;lgl&gt; FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, …
217-
$ view &lt;int&gt; 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0,…
218-
$ condition &lt;fct&gt; 3, 3, 3, 5, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3, 3, 4, 4,…
219-
$ grade &lt;fct&gt; 7, 7, 6, 7, 8, 11, 7, 7, 7, 7, 8, 7, 7, 7, 7, 9, 7, 7, 7…
220-
$ sqft_above &lt;int&gt; 1180, 2170, 770, 1050, 1680, 3890, 1715, 1060, 1050, 189
221-
$ sqft_basement &lt;int&gt; 0, 400, 0, 910, 0, 1530, 0, 0, 730, 0, 1700, 300, 0, 0, …
222-
$ yr_built &lt;int&gt; 1955, 1951, 1933, 1965, 1987, 2001, 1995, 1963, 1960, 20
223-
$ yr_renovated &lt;int&gt; 0, 1991, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
224-
$ zipcode &lt;fct&gt; 98178, 98125, 98028, 98136, 98074, 98053, 98003, 98198, …
225-
$ lat &lt;dbl&gt; 47.5, 47.7, 47.7, 47.5, 47.6, 47.7, 47.3, 47.4, 47.5, 47…
226-
$ long &lt;dbl&gt; -122, -122, -122, -122, -122, -122, -122, -122, -122, -1…
227-
$ sqft_living15 &lt;int&gt; 1340, 1690, 2720, 1360, 1800, 4760, 2238, 1650, 1780, 23
228-
$ sqft_lot15 &lt;int&gt; 5650, 7639, 8062, 5000, 7503, 101930, 6819, 9711, 8113, …</code></pre>
208+
$ id &lt;chr&gt; "7129300520", "6414100192", "5631500400", "2487200875", "1954400510", "7237550310", "1
209+
$ date &lt;date&gt; 2014-10-13, 2014-12-09, 2015-02-25, 2014-12-09, 2015-02-18, 2014-05-12, 2014-06-27, 2
210+
$ price &lt;dbl&gt; 221900, 538000, 180000, 604000, 510000, 1225000, 257500, 291850, 229500, 323000, 66250
211+
$ bedrooms &lt;int&gt; 3, 3, 2, 4, 3, 4, 3, 3, 3, 3, 3, 2, 3, 3, 5, 4, 3, 4, 2, 3, 4, 3, 5, 2, 3, 3, 3, 3, 3,
212+
$ bathrooms &lt;dbl&gt; 1.00, 2.25, 1.00, 3.00, 2.00, 4.50, 2.25, 1.50, 1.00, 2.50, 2.50, 1.00, 1.00, 1.75, 2.
213+
$ sqft_living &lt;int&gt; 1180, 2570, 770, 1960, 1680, 5420, 1715, 1060, 1780, 1890, 3560, 1160, 1430, 1370, 181
214+
$ sqft_lot &lt;int&gt; 5650, 7242, 10000, 5000, 8080, 101930, 6819, 9711, 7470, 6560, 9796, 6000, 19901, 9680
215+
$ floors &lt;dbl&gt; 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.5, 1.0, 1.5, 2.0, 2.0, 1
216+
$ waterfront &lt;lgl&gt; FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FA
217+
$ view &lt;int&gt; 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0,
218+
$ condition &lt;fct&gt; 3, 3, 3, 5, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 4, 5, 3, 5, 3,
219+
$ grade &lt;fct&gt; 7, 7, 6, 7, 8, 11, 7, 7, 7, 7, 8, 7, 7, 7, 7, 9, 7, 7, 7, 7, 7, 9, 8, 7, 8, 6, 8, 8, 7
220+
$ sqft_above &lt;int&gt; 1180, 2170, 770, 1050, 1680, 3890, 1715, 1060, 1050, 1890, 1860, 860, 1430, 1370, 1810
221+
$ sqft_basement &lt;int&gt; 0, 400, 0, 910, 0, 1530, 0, 0, 730, 0, 1700, 300, 0, 0, 0, 970, 0, 0, 0, 0, 760, 720,
222+
$ yr_built &lt;int&gt; 1955, 1951, 1933, 1965, 1987, 2001, 1995, 1963, 1960, 2003, 1965, 1942, 1927, 1977, 19
223+
$ yr_renovated &lt;int&gt; 0, 1991, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
224+
$ zipcode &lt;fct&gt; 98178, 98125, 98028, 98136, 98074, 98053, 98003, 98198, 98146, 98038, 98007, 98115, 98
225+
$ lat &lt;dbl&gt; 47.5, 47.7, 47.7, 47.5, 47.6, 47.7, 47.3, 47.4, 47.5, 47.4, 47.6, 47.7, 47.8, 47.6, 47
226+
$ long &lt;dbl&gt; -122, -122, -122, -122, -122, -122, -122, -122, -122, -122, -122, -122, -122, -122, -1…
227+
$ sqft_living15 &lt;int&gt; 1340, 1690, 2720, 1360, 1800, 4760, 2238, 1650, 1780, 2390, 2210, 1330, 1780, 1370, 13
228+
$ sqft_lot15 &lt;int&gt; 5650, 7639, 8062, 5000, 7503, 101930, 6819, 9711, 8113, 7570, 8925, 6000, 12697, 10208</code></pre>
229229
<p>Here are some questions you can ask yourself at this stage of an EDA: Which variables are numerical? Which are categorical? For the categorical variables, what are their levels? Besides the variables we’ll be using in our regression model, what other variables do you think would be useful to use in a model for predicting house price?</p>
230230
<p>Observe, for example, with the raw data that while the <code>condition</code> variable has values <code>1</code> through <code>5</code>, these are saved in R as <code>fct</code> standing for “factors.” Recall this is one of R’s ways of saving categorical variables. So you should think of these as the “labels” <code>1</code> through <code>5</code> and not the numerical values <code>1</code> through <code>5</code>.</p>
231231
<p>Let’s now perform the second step in an EDA: computing summary statistics. Recall from Section <a href="wrangling.html#summarize">3.3</a> that <em>summary statistics</em> are single numerical values that summarize a large number of values. Examples of summary statistics include the mean, the median, the standard deviation, and various percentiles.</p>
@@ -1382,12 +1382,12 @@ <h3>
13821382
<code class="sourceCode R"><span><span class="fu"><a href="https://pillar.r-lib.org/reference/glimpse.html">glimpse</a></span><span class="op">(</span><span class="va">US_births_1994_2003</span><span class="op">)</span></span></code></pre></div>
13831383
<pre><code>Rows: 3,652
13841384
Columns: 6
1385-
$ year &lt;int&gt; 1994, 1994, 1994, 1994, 1994, 1994, 1994, 1994, 1994, 19…
1386-
$ month &lt;int&gt; 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
1387-
$ date_of_month &lt;int&gt; 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1
1388-
$ date &lt;date&gt; 1994-01-01, 1994-01-02, 1994-01-03, 1994-01-04, 1994-01…
1389-
$ day_of_week &lt;ord&gt; Sat, Sun, Mon, Tues, Wed, Thurs, Fri, Sat, Sun, Mon, Tue
1390-
$ births &lt;int&gt; 8096, 7772, 10142, 11248, 11053, 11406, 11251, 8653, 791</code></pre>
1385+
$ year &lt;int&gt; 1994, 1994, 1994, 1994, 1994, 1994, 1994, 1994, 1994, 1994, 1994, 1994, 1994, 1994, 19…
1386+
$ month &lt;int&gt; 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1387+
$ date_of_month &lt;int&gt; 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
1388+
$ date &lt;date&gt; 1994-01-01, 1994-01-02, 1994-01-03, 1994-01-04, 1994-01-05, 1994-01-06, 1994-01-07, 1
1389+
$ day_of_week &lt;ord&gt; Sat, Sun, Mon, Tues, Wed, Thurs, Fri, Sat, Sun, Mon, Tues, Wed, Thurs, Fri, Sat, Sun,
1390+
$ births &lt;int&gt; 8096, 7772, 10142, 11248, 11053, 11406, 11251, 8653, 7910, 10498, 11706, 11567, 11212,</code></pre>
13911391
<p>We’ll focus on the number of <code>births</code> for each <code>date</code>, but only for births that occurred in 1999. Recall from Section <a href="wrangling.html#filter">3.2</a> we can do this using the <code><a href="https://dplyr.tidyverse.org/reference/filter.html">filter()</a></code> function from the <code>dplyr</code> package:</p>
13921392
<div class="sourceCode" id="cb567"><pre class="downlit sourceCode r">
13931393
<code class="sourceCode R"><span><span class="va">US_births_1999</span> <span class="op">&lt;-</span> <span class="va">US_births_1994_2003</span> <span class="op">|&gt;</span></span>

0 commit comments

Comments
 (0)