Update 09-hypothesis-testing.Rmd

ismayc · web-flow · commit a367ec7a935a · 2025-02-19T08:14:11.000-07:00
diff --git a/09-hypothesis-testing.Rmd b/09-hypothesis-testing.Rmd
@@ -742,7 +742,7 @@ prop_metal_popular <- round(prop_metal_popular, 3)
 prop_deephouse_popular <- round(prop_deephouse_popular, 3)
 ```
 
-So in this one sample of a hypothetical universe of no difference in genre popularity, $`r n_metal_popular`/26 = `r prop_metal_popular` = `r prop_metal_popular*100`\%$ of metal songs were popular. On the other hand, $`r n_deephouse_popular`/26 = `r prop_deephouse_popular` = `r prop_deephouse_popular*100`\%$ of deep house songs were popular. Let's next compare these two values. It appears that metal tracks were popular at a rate that was $`r prop_metal_popular ` - `r prop_deephouse_popular ` = `r diff_prop` = `r diff_prop*100`\%$ different than deep house songs.
+So in this one sample of a hypothetical universe of no difference in genre popularity, $`r n_metal_popular`/26 = `r prop_metal_popular` = `r prop_metal_popular*100`\%$ of metal songs were popular. On the other hand, $`r n_deephouse_popular`/26 = `r prop_deephouse_popular` = `r prop_deephouse_popular*100`\%$ of deep house songs were popular. Let's next compare these two values. It appears that metal tracks were popular at a rate that was $`r prop_metal_popular ` - `r prop_deephouse_popular ` = `r diff_prop` = `r diff_prop*100`\$ percentage points different than deep house songs.
 
 Observe how this difference in rates is not the same as the difference in rates of `r observed_test_statistic` = `r observed_test_statistic*100`% we originally observed. This is once again due to *sampling variation*. How can we better understand the effect of this sampling variation? By repeating this shuffling several times!
 
@@ -788,9 +788,9 @@ sampling_scenarios |>
 
 So, based on our sample of $n_m = 1000$ metal tracks and $n_f = 1000$ deep house tracks, the *point estimate* for $p_{m} - p_{d}$ is the *difference in sample proportions*
 
-$$\widehat{p}_{m} -\widehat{p}_{f} = `r p_metal_popular` - `r p_deephouse_popular` = `r observed_test_statistic` = `r observed_test_statistic*100`\%$$. 
+$$\widehat{p}_{m} -\widehat{p}_{f} = `r p_metal_popular` - `r p_deephouse_popular` = `r observed_test_statistic`$$. 
 
-This difference in favor of metal songs of `r observed_test_statistic` is greater than 0, suggesting metal songs are more popular than deep house songs.
+This difference in favor of metal songs of `r observed_test_statistic` (`r observed_test_statistic*100` percentage points) is greater than 0, suggesting metal songs are more popular than deep house songs.
 
 However, the question we ask ourselves was "is this difference meaningfully greater than 0?". In other words, is that difference indicative of true popularity, or can we just attribute it to *sampling variation*? Hypothesis testing allows us to make such distinctions.