Skip to content

Commit a367ec7

Browse files
authored
Update 09-hypothesis-testing.Rmd
1 parent 8dea389 commit a367ec7

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

09-hypothesis-testing.Rmd

+3-3
Original file line numberDiff line numberDiff line change
@@ -742,7 +742,7 @@ prop_metal_popular <- round(prop_metal_popular, 3)
742742
prop_deephouse_popular <- round(prop_deephouse_popular, 3)
743743
```
744744

745-
So in this one sample of a hypothetical universe of no difference in genre popularity, $`r n_metal_popular`/26 = `r prop_metal_popular` = `r prop_metal_popular*100`\%$ of metal songs were popular. On the other hand, $`r n_deephouse_popular`/26 = `r prop_deephouse_popular` = `r prop_deephouse_popular*100`\%$ of deep house songs were popular. Let's next compare these two values. It appears that metal tracks were popular at a rate that was $`r prop_metal_popular ` - `r prop_deephouse_popular ` = `r diff_prop` = `r diff_prop*100`\%$ different than deep house songs.
745+
So in this one sample of a hypothetical universe of no difference in genre popularity, $`r n_metal_popular`/26 = `r prop_metal_popular` = `r prop_metal_popular*100`\%$ of metal songs were popular. On the other hand, $`r n_deephouse_popular`/26 = `r prop_deephouse_popular` = `r prop_deephouse_popular*100`\%$ of deep house songs were popular. Let's next compare these two values. It appears that metal tracks were popular at a rate that was $`r prop_metal_popular ` - `r prop_deephouse_popular ` = `r diff_prop` = `r diff_prop*100`\$ percentage points different than deep house songs.
746746

747747
Observe how this difference in rates is not the same as the difference in rates of `r observed_test_statistic` = `r observed_test_statistic*100`% we originally observed. This is once again due to *sampling variation*. How can we better understand the effect of this sampling variation? By repeating this shuffling several times!
748748

@@ -788,9 +788,9 @@ sampling_scenarios |>
788788

789789
So, based on our sample of $n_m = 1000$ metal tracks and $n_f = 1000$ deep house tracks, the *point estimate* for $p_{m} - p_{d}$ is the *difference in sample proportions*
790790

791-
$$\widehat{p}_{m} -\widehat{p}_{f} = `r p_metal_popular` - `r p_deephouse_popular` = `r observed_test_statistic` = `r observed_test_statistic*100`\%$$.
791+
$$\widehat{p}_{m} -\widehat{p}_{f} = `r p_metal_popular` - `r p_deephouse_popular` = `r observed_test_statistic`$$.
792792

793-
This difference in favor of metal songs of `r observed_test_statistic` is greater than 0, suggesting metal songs are more popular than deep house songs.
793+
This difference in favor of metal songs of `r observed_test_statistic` (`r observed_test_statistic*100` percentage points) is greater than 0, suggesting metal songs are more popular than deep house songs.
794794

795795
However, the question we ask ourselves was "is this difference meaningfully greater than 0?". In other words, is that difference indicative of true popularity, or can we just attribute it to *sampling variation*? Hypothesis testing allows us to make such distinctions.
796796

0 commit comments

Comments
 (0)