FMPH-Homework-4.Rmd

---
title: "FMPH-Homework-4"
author: "Hongjie (Harry) Qian"
date: "2023-11-17"
output: pdf_document
---

```{r message=FALSE, warning=FALSE}
library(alr4)
```

# Question 1 (Q7.6)

**Question** (Data file: `stopping`) The (hypothetical) data in the file give automobile stopping `Distance` in feet and `Speed` in mph for *n* = 62 trials of various automobiles (Ezekiel and Fox, 1959).

## 7.6.1

Draw a scatterplot of `Distance` versus `Speed`. Explain why this graph supports fitting a quadratic regression model.\
**Answer**:

```{r, fig.height=4, fig.width=5, fig.align='center'}
plot(Distance~Speed, data=stopping, xlab="Speed (mph)", 
     ylab="Stopping distance (in feet)", pch=16)
grid(lty="solid")
```

Reason why this graph supports fitting a quadratic regression model:

1.  **Curvature**: based on the shape of the data points on this scatterplot, there is an obvious curvature in a near-parabolic manner.

2.  **Residual analysis**: if we fit a simple linear regression to this scatterplot:

    ```{r, fig.height=4, fig.width=5, fig.align='center'}
    lm_761 <- lm(Distance~Speed, data=stopping)
    plot(lm_761,1)
    lmtest::bptest(lm_761)
    car::ncvTest(lm_761)
    ```

    we can see that based on the residuals versus fitted value plot, there is an obvious curvature on the curve; meanwhile, Breusch-Pagan test `bptest()`, a test against heteroscedasticity, has been proven to be significant enough (*p*-value = 0.0001376) to reject the null hypothesis, where the variance of residuals is constant. Similarly, the score test for non-constant error variance `ncvTest()` could reject the homoscedasticity null hypothesis as well with a *p*-value of 5.9822e-06, which is significant enough to reject *H*~0~.

## 7.6.2

Fit the quadratic model but with constant variance. Compute the score test for non-constant variance for the alternatives that (a) variance depends on the mean; (b) variance depends on `Speed`; and (c) variance depends on `Speed` and `Speed`^2^. Is adding `Speed`^2^ helpful?\
**Answer**:

```{r, fig.height=4, fig.width=5, fig.align='center'}
# Quadratic model fitting
lm_762 <- lm(Distance ~ Speed + I(Speed^2), data=stopping)
summary(lm_762)
plot(lm_762,1)
# non-constant variance test
car::ncvTest(lm_762) ## ncvTest dependent on the mean
car::ncvTest(lm_762, ~ Speed) ## ncvTest dependent on `Speed`
car::ncvTest(lm_762, ~ Speed + I(Speed^2)) ## ncvTest dependent on both
```

Therefore, the score test for non-constant variance for the alternatives that

-   a\. variance depends on the mean: 22.97013

-   b\. variance depends on `Speed`: 23.39216

-   c\. variance depends on `Speed` and `Speed`^2^: 23.46559

Therefore, with `Speed`^2^ (`I(Speed^2)`) taken into consideration, the degree of freedom (`Df`), which is the number of transformations, was increased by 1, representing the increase of model complexity; meanwhile, the test score (`Chisquare`) was also increased by 23.46559 - 23.39216 = 0.07343. This showed the improvement in the non-constant variance test and hence, adding `Speed`^2^ is helpful.

Meanwhile, based on the residual plot, we can say after the inclusion of quadratic term, the heteroscedasticity problem was alleviated.

## 7.6.3

Refit the quadratic regression model assuming Var(`Distance`\|`Speed`) = `Speed`$\sigma^2$. Compare the estimates and their standard errors with the un-weighted case.\
**Answer**:

```{r, fig.height=4, fig.width=5, fig.align='center'}
lm_763 <- lm(Distance ~ Speed + I(Speed^2), 
             data = stopping, weights = 1/Speed)
compareCoefs(lm_762,lm_763)
plot(lm_763,1)
```

Comparison between two models: the key difference between the establishment of two models is whether to include the weights of `Speed`, and this led to all the distinctions in coefficient estimates and standard errors.

-   For the intercept: with the weight added, the estimate value of intercept decreased from 1.58 to 1.33, and its standard error (SE) decreased from 5.10 to 3.10.

    -   With weight `1/Speed` included ​​in a regression model, data points with higher values ​​of `Speed` ​​are given lower weights. This may caused the model to place greater weight on data points in low `Speed` regions. Since these low `Speed` data points have relatively small `Distance` values as seen from the scatterplot, then the weighted regression may tend to lower the intercept to better fit these points.

-   For the `Speed` term: with the weighted added, the estimate value of `Speed` coefficient increased from 0.416 to 0.448, and its SE decreased as well from 0.556 to 0.421.

    -   Similarly, the introduction of weights changes the contribution of data points to the linear regression fit. Since `Distance` increases at a faster rate at high `Speed` than at low `Speed` as seen from the scatterplot as well, reducing the weight of the high `Speed` points may cause the linear regression to tend to fit the data more in the lower `Speed` region, which may require a larger `Speed` ​​coefficient as compensation.

-   For the `I(Speed^2)` term: with the weighted added, the estimate value of `I(Speed^2)` decreased slightly from 0.0656 to 0.0648, and its SE decreased from 0.0130 to 0.0112.

    -   For the coefficient estimate of the quadratic term `I(Speed^2)`, the direction of change will depend on how the weights affect nonlinear patterns in the data. Since the weighting makes the nonlinear pattern more dependent on the low `Speed` region, the quadratic term coefficients may also need to be adjusted to accommodate this pattern.

-   For the SEs in summary: the [decrease]{.underline} in the SE values of all coefficients indicates that the uncertainty in the coefficient estimates decreases after the introduction of the weight `1/Speed` because weighting improves regression model fitting, especially when the original model is affected by heteroscedasticity. Weighting stabilizes the residual variance, and therefore improves coefficient estimate accuracy. Meanwhile, based on the residual plot, we can say after the inclusion of weighting, the heteroscedasticity problem was alleviated as well.

## 7.6.4

Based on the un-weighted model, use a sandwich estimator of variance to correct for non-constant variance. Compare with the results of the last subproblem.\
**Answer**:

```{r message=FALSE, warning=FALSE}
library(lmtest)
# Original un-weighted OLS model
signif(coeftest(lm_762, vcov. = vcov(lm_762)), digits = 3)
# Sandwich estimator
sigma = hccm(lm_762, type="hc3")
lm_764 <- signif(coeftest(lm_763, vcov. = sigma), digits = 3)
lm_764

# calculate directly
hccm(lm_762, type="hc3")
sqrt(diag(hccm(lm_762, type="hc3")))
```

Therefore, we compare the results of coefficients in the linear regression model:

-   For all three coefficients, their estimates are quite different between the original un-weighted model and the sandwich estimator employed model since sandwich estimator calculate the variances via correcting heteroscedasticity in a robust manner.

-   While for the SEs of the coefficient estimates:

    -   The SE value of the intercept term decreased from 5.10 to 4.30, indicating the decrease of the uncertainty in the intercept estimate when considering heteroscedasticity.

    -   The SE values of the `Speed` term and `I(Speed^2)` term both increased from 0.556 to 0.630 and from 0.0130 to 0.0172, respectively. This showed the decrease of the uncertainty in the estimates of the `Speed` term and `I(Speed^2)` term when considering heteroscedasticity.

    -   However, when compared with both the `OLS` model and the `WLS` model, we can see that the estimate of intercept was between the `OLS` model and the `WLS` model and the estimates of the `Speed` term and `I(Speed^2)` term were both higher than both the `OLS` model and the `WLS` model.

        -   This implies that the sandwich estimator might lean more on decreasing the uncertainity of the intercept when it intended to correct the heteroscedasticity of the regression model.

## 7.6.5

Fit the un-weighted quadratic model, but use a case re-sampling bootstrap to estimate standard errors, and compare with the previous methods.\
**Answer**:

```{r}
# Bootstrap of the SE estimates
B = 1000
n = nrow(stopping)
coef.boot <- matrix(NA, nrow = B, ncol = 3)
for (b in 1:B) {
  indices = stopping[sample(n, replace = TRUE), ]
  m.boot = lm(Distance ~ Speed + I(Speed^2), data=indices)
  coef.boot[b, ] <- coef(m.boot)
}
boot_se <- apply(coef.boot, 2, sd)
boot_se

# SE data summary
se_values_OLS <- signif(summary(lm_762)$coefficients[, "Std. Error"], digits=3)
se_values_WLS <- signif(summary(lm_763)$coefficients[, "Std. Error"], digits=3)
se_values_sandwich <- lm_764[, "Std. Error"]
SE <- data.frame(
  OLS = se_values_OLS,
  WLS = se_values_WLS,
  Sandwich = se_values_sandwich,
  Bootstrap = signif(boot_se, digits=3)
)
library(knitr)
kable(SE)
```

Therefore, we can get the result that the bootstrap results of the SE estimates are all similar to the sandwich estimators of the SE numerically, though they handled the heteroscedasticity in a very different mechanism.

Interpretations:

-   The reason why they provide similar results is that they are both empirically-based approach in estimating the SE values in a relatively robust manner.

-   Meanwhile, the estimates of Bootstrap and sandwich estimator are different from the `WLS` method because their estimation mechanisms are different:

    -   Bootstrap and sandwich estimator rely on empirical approach: Bootstrap relies on empirical distribution and sandwich estimator uses the residuals from the actual data to adjust the estimate of the standard error.

    -   While `WLS` relies on adjusting the weights of the variances of the observations. These weights are based on assumptions about the variance of the residuals to obtain the homoscedasticity among the data. The effectiveness of `WLS` is dependent on the accuracy of the weight selection.

# Question 2 (Q7.8)

**Question** (Data file: `jevons`) The data in this example are deduced from a diagram in Jevons (1868) and provided by Stephen M. Stigler. In a study of coinage, Jevons weighed 274 gold sovereigns that he had collected from circulation in Manchester, England. For each coin, he recorded the weight after cleaning to the nearest 0.001 g, and the date of issue. The data file includes `Age`, the age of the coin in decades, `n`, the number of coins in the age class, `Weight`, the average weight of the coins in the age class, `SD`, the standard deviation of the weights. The minimum `Min` and maximum `Max` of the weights are also given. The standard weight of a gold sovereign was 7.9876 g; the minimum legal weight was 7.9379 g.

## 7.8.1

Draw a scatterplot of `Weight` versus `Age`, and comment on the applicability of the usual assumptions of the linear regression model. Also draw a scatterplot of `SD` versus `Age`, and summarize the information in this plot.\
**Answer**:

```{r, fig.height=4, fig.width=5, fig.align='center'}
# Weight ~ Age
plot(Weight ~ Age, data = jevons, xlab = "Age of coins (decades)", 
     ylab = "Average weight (g)", pch=16)
```

```{r, fig.height=4, fig.width=5, fig.align='center'}
# SD ~ Age
plot(SD ~ Age, data = jevons, xlab = "Age of coins (decades)", 
     ylab = "Standard deviation", pch=16)
lm_781 <- lm(SD ~ Age, data = jevons)
qqPlot(lm_781)
```

Based on the `SD` \~ `Age` plot, we can see for different `Age` values, the corresponding standard deviations are vastly different and when the age of coins is larger, the variability, seen from `SD`, is larger. There is a clear linearity seen from the shape of the `SD` \~ `Age` scatterplot.

The common assumptions of linear regression model include:

-   Linearity: satisfied here;

-   Normality: NOT satisfied based on the Q-Q plot;

-   Homoscedasticity: **NOT** satisfied.

Therefore, we could use a `WLS` fitting for the linear regression model to address with the heteroscedasticity problem.

## 7.8.2

To fit a simple linear regression model with `Weight` as the response, `WLS` should be used with variance function Var(`Weight`\|`Age`) = `n`$\sigma^2$/`SD`^2^. Sample sizes are large enough to assume the `SD` are population values. Fit the `WLS` model.\
**Answer**:

```{r}
head(jevons)
lm_782 <- lm(Weight ~ Age, data=jevons, weights = SD^2/n)
summary(lm_782)
```

In the `WLS` fitting of the linear regression model, we get that if we set the weights as $n/\mathrm{SD}^2$, the estimated residual standard error is 1.738$\times 10^{-5}$, far less that 1, indicating that the difference between actual observations and model predictions is smaller than expected.

## 7.8.3

Is the fitted regression consistent with the known standard weight for a new coin?\
**Answer**:\
When it comes to a new coin, its `Age` will be equal to 0. Therefore,

```{r}
Weight_age_0_pre <- signif(predict(lm_782, data.frame(Age=0), 
                                   interval = "confidence"), digits=5)
Weight_age_0_pre
Weight_age_0 = 7.9876
```

Based on the prediction of the new coin weight, we find the standard weight of a gold sovereign, which is 7.9876 g, is covered between the lower and upper limit of the 95% confident predicted range (7.9849, 8.0205). Therefore, the fitted regression is approximately consistent with the known standard weight for a new coin.

## 7.8.4

For previously un-sampled coins of `Age` = 1, 2, 3, 4, 5, estimate the probability that the weight of the coin is less than the legal minimum.\
(*Hints*: The standard error of prediction is the square root of [the sum of two terms]{.underline}, the assumed known variance of an un-sampled coin of known `Age`, which is different for each age, and the estimated variance of the fitted value for that `Age`; the latter is computed from the formula for the variance of a fitted value. You should use the normal distribution rather than a *t* to get the probabilities.)\
**Answer**:

```{r}
Weight_age_standard = 7.9876
Weight_legal_minimum = 7.9379
predict_weight <- predict(lm_782, data.frame(Age=1:5), se.fit=TRUE)
summary(predict_weight)
predict_weight_se <- sqrt(jevons$SD^2 + predict_weight$se.fit^2) ## sum of 2 SDs
z_score <- (predict_weight$fit - Weight_legal_minimum)/predict_weight_se
p_value <- 1 - pnorm(z_score)
SE_weight <- data.frame(
  Age = 1:5,
  predicted_weight = predict_weight$fit,
  predict_weight_se,
  z_score,
  p_value
)
library(knitr)
kable(SE_weight)
```

Therefore, three decades years or older coins, whose `z_score` is less than 0, are likely to have a weight less than the legal minimum (7.9379 g).

## 7.8.5

Determine the `Age` at which the predicted weight of coins is equal to the legal minimum, and use the delta method to get a standard error for the estimated age. This problem is called *inverse regression*, and is discussed by Brown (1993).\
**Answer**:

```{r}
signif(deltaMethod(lm_782, "(Weight_legal_minimum-(Intercept))/Age"), digits=4)
```

Therefore, the `Age` at which the predicted weight of coins is equal to the legal minimum (7.9379 g) is 2.48 decades, and the standard error is 0.0966.

# Question 3 (Q7.10)

(Data file: `fuel2001`)

## 7.10.1

Use the bootstrap to estimate confidence intervals for the coefficients in the fuel data, and compare the results with the usual large sample `OLS` estimates.\
**Answer**:

We first obtain the large sample `OLS` estimates of the CIs for the coefficients.

```{r message=FALSE, warning=FALSE}
# OLS CI estimates
fuel2001$Dlic <- 1000*fuel2001$Drivers/fuel2001$Pop
fuel2001$fuel <- 1000*fuel2001$FuelC/fuel2001$Pop
fuel2001$income <- fuel2001$Income/1000
lm_7101 <- lm(fuel ~ Tax + Dlic + income + log(Miles), data=fuel2001)
ci_ols <- confint(lm_7101, level = 0.95)
ci_ols
```

Then, we use the Bootstrap method for confidence interval estimation. First, resampling from scratch is employed.

```{r}
# Bootstrap CI estimates
## with nonparametric bootstrap: resampling from scratch
B = 1000
n <- nrow(fuel2001)
coef_boot <- matrix(0, B, 5)
set.seed(123)
for (b in 1:B) {
    indices = sample(seq(1,n), replace=T)
    boot_model = lm(fuel[indices] ~ Tax[indices] + Dlic[indices] 
                    + income[indices] + log(Miles)[indices], data=fuel2001)
    coef_boot[b,] = boot_model$coefficients
}
quantile(coef_boot[,1], probs = c(0.025, 0.975))
quantile(coef_boot[,2], probs = c(0.025, 0.975))
quantile(coef_boot[,3], probs = c(0.025, 0.975))
quantile(coef_boot[,4], probs = c(0.025, 0.975))
quantile(coef_boot[,5], probs = c(0.025, 0.975))
```

We also use the residual parametric bootstrap:

```{r}
## with residual parametric bootstrap:
B = 1000
n = nrow(fuel2001)
coef.boot <- matrix(0, B, 5)
set.seed(123)
for (b in 1:B) {
  indices = sample(seq(1,n), replace=T)
  regressor.boot = lm_7101$fitted.values + lm_7101$residuals[indices]
  model.boot = lm(regressor.boot ~ Tax + Dlic + income 
                  + log(Miles), data=fuel2001)
  coef.boot[b,] = model.boot$coefficients
}
quantile(coef_boot[,1], probs = c(0.025, 0.975))
quantile(coef_boot[,2], probs = c(0.025, 0.975))
quantile(coef_boot[,3], probs = c(0.025, 0.975))
quantile(coef_boot[,4], probs = c(0.025, 0.975))
quantile(coef_boot[,5], probs = c(0.025, 0.975))
```

From the results above, we could see that:

-   Between the two bootstrap methods we use: we can see that the obtained confidence intervals of the estimated regressor coefficient values from their results are the same.

-   Between `OLS` and Bootstrap: we can see differences between the estimated confidence intervals for the coefficients in the fuel data -

    -   For `(Intercept)`: the Bootstrap CI has a smaller lower limit and a larger upper limit than the `OLS` CI.

    -   For `Tax`: the Bootstrap CI has a smaller lower limit and a larger upper limit than the `OLS` CI.

    -   For `Dlic`: the Bootstrap CI has a smaller lower limit and a larger upper limit than the `OLS` CI.

    -   For `income`: the Bootstrap CI has a higher lower limit and a smaller upper limit than the `OLS` CI.

    -   For `log(Miles)`: the Bootstrap CI has a smaller lower limit and a larger upper limit than the `OLS` CI.

-   Interpretation: the Bootstrap CI is quite inconsistent with the `OLS` CI.

## 7.10.2

Examine the histograms of the bootstrap replications for each of the coefficients. Are the histograms symmetric or skewed? Do they look like normally distributed data, as they would if the large sample normal theory applied to these data? Do the histograms support or refute the differences between the bootstrap and large sample confidence intervals found in Problem 7.10.1?\
**Answer**:

```{r, fig.height=12, fig.width=8, fig.align='center'}
par(mfrow=c(3,2))  
for (i in 1:5) {
    # plot the histogram
    hist(coef_boot[,i], breaks=50, probability = TRUE,
         main=paste("Histogram of Coefficient", i), 
         xlab="Coefficient Value", col="lightblue")

    # calculate the mean and variance of the boot dist
    mean_val <- mean(coef_boot[,i])
    sd_val <- sd(coef_boot[,i])

    # add normal dist curve for comparison
    curve(dnorm(x, mean=mean_val, sd=sd_val), add=TRUE, col="plum", lwd=2)
}
```

Therefore,

-   Histograms symmetric or skewed?

    -   `(Intercept)`: right-skewed due to the longer right tail;

    -   `Tax`: looks symmetric;

    -   `Dlic`: looks symmetric and no visible skewness;

    -   `income`: looks symmetric;

    -   `log(Miles)`: slightly left-skewed but mostly symmetric.

-   Look like normally distributed data?

    -   `(Intercept)`: deviate slightly from normality;

    -   `Tax`: look like normal distributed data;

    -   `Dlic`: look like normal distributed data;

    -   `income`: look like normal distributed data;

    -   `log(Miles)`: generally look like normal distributed data.

-   Support differences between confidence intervals? Yes, the slight skewnesses in the histograms are caused by the differences between the bootstrap and large sample confidence intervals found in Problem 7.10.1, and the visualization of bootstrap replications via histograms validates such differences.

# Question 4 (Q9.16)

**Question** (**Florida election 2000** Data file: `florida`) In the 2000 election for U.S. president, the counting of votes in Florida was controversial. In Palm Beach County in south Florida, for example, voters used a so-called butterfly ballot. Some believe that the layout of the ballot caused some voters to cast votes for Buchanan when their intended choice was Gore.

The data from Smith (undated) has four variables, `County`, the county name, and `Gore`, `Bush`, and `Buchanan`, the number of votes for each of these three candidates. Draw the scatterplot of `Buchanan` versus `Bush`, and test the hypothesis that Palm Beach County is an outlier relative to the simple linear regression mean function for E(`Buchanan`\|`Bush`). Identify another county with an unusual value of the Buchanan vote, given its Bush vote, and test that county to be an outlier. State your conclusions from the test, and its relevance, if any, to the issue of the butterfly ballot.

Next, repeat the analysis, but first consider transforming the variables in the plot to better satisfy the assumptions of the simple linear regression model. Again test to see if Palm Beach County is an outlier, and summarize.\
**Answer**:

Here we set the significance level in both outlier tests after Bonferroni correction as *p*-value = 0.05.

-   Non-transformation outlier identification

    -   We first examine the data structure of the original data and plot the scatterplot of the data-points based on the original data. For convenience, the row names are extracted for county name annotation.

        ```{r, fig.height=6, fig.width=8, fig.align='center'}
        head(florida)
        plot(Buchanan ~ Bush, data=florida, xlab="Vote for Bush", 
             ylab="Vote for Buchanan", pch=16)
        row_names <- rownames(florida)
        text(florida$Bush, florida$Buchanan, labels = row_names, pos = 1, cex = 0.8)
        ```

    -   Then, a simple linear model is constructed with the original data. We also plot the residual plot and Q-Q plot for linear regression assumption verification. Based on these two plots, we found that "PALM BEACH" and "DADE" could be two possible outliers. Meanwhile, these two plots indicate that the linear regression assumptions are not satisfied and hence, we can use transformation to better satisfy the assumptions of the SLR model, which is in the next section.\
        Which is, Palm Beach County is an outlier relative to the simple linear regression mean function for E(`Buchanan`\|`Bush`), and the another county with an unusual value of the Buchanan vote is Dade County.

        ```{r, fig.height=4, fig.width=5, fig.align='center'}
        lm_916 <- lm(Buchanan ~ Bush, data = florida)
        plot(lm_916, 1)
        qqPlot(lm_916)
        ```

    -   Last, the outlier test is conducted and found that based on the Bonferroni *p*-value, "PALM BEACH" is the outlier, whose Bonferroni *p*-value is 5.78 $\times$ 10^-32^, which is significant enough to reject the null hypothesis *H*~0~ that the data point is not an outlier. Meanwhile, Dade County is not an outlier based on the outlier test here.

        ```{r}
        outlier <- outlierTest(lm_916, cutoff = 0.05)
        outlier
        ```

    Based on the SLR model construction and outlier test of the original data, we find the non-transformation linear regression model has flaws in meeting the basic regression assumptions and Palm Beach is the outlier under this circumstance where we use the original data and set the threshold of the outlier test *p*-value as 0.05. We also find Dade County could be a possible outlier based on the residual plot and the Q-Q plot but it dose not give significant result in the outlier test.

-   Log transformation: to better satisfy the assumptions of the SLR model, we use log transformation in this section.

    -   For convenience, we set three new variables in the `florida` dataset for the log transformation of the three variables (`Buchanan`, `Bush`, and `Gore`) as `logBuchanan`, `logBush`, and `logGore`.

        ```{r}
        florida$logBuchanan <- log(florida$Buchanan)
        florida$logBush <- log(florida$Bush)
        florida$logGore <- log(florida$Gore)
        ```

    -   Then we plot the scatterplot of the log-transformed data similarly and give similar annotations.

        ```{r}
        plot(logBuchanan ~ logBush, data=florida, xlab="Log Vote for Bush", 
             ylab="Log Vote for Buchanan", pch=16)
        row_names <- rownames(florida)
        text(florida$logBush, florida$logBuchanan, labels = row_names, pos = 1, cex = 0.8)
        ```

    -   Employing the log-transformed data, a similar SLR model is constructed and the residual plot and Q-Q plot show that "PALM BEACH" could still be the possible outlier.

        ```{r, fig.height=4, fig.width=5, fig.align='center'}
        lm_9162 <- lm(logBuchanan ~ logBush, data = florida)
        plot(lm_9162, 1)
        qqPlot(lm_9162)
        ```

    -   An outlier test find that "PALM BEACH" is still the outlier, whose Bonferroni *p*-value, after the log transformation of the original data, is 8.93 $\times$ 10^-3^, which is significant enough to reject the null hypothesis *H*~0~ that the data point is not an outlier.

        ```{r}
        outlier_2 <- outlierTest(lm_9162, cutoff = 0.05)
        outlier_2
        ```

    To conclude, after log transformation the basic assumptions of linear regression are better satisfied and the outlier under this circumstance, where we use the log-transformed data and set the threshold of the outlier test *p*-value as 0.05, is still Palm Beach.

Therefore, overall, the outlier of the dataset, despite whether we do log transformation on the dataset or not, is Palm Beach County.

"Butterfly ballot" is a specific design of ballot was most famously used in Palm Beach County in 2000 US Presidential Election. This design could result in some voters who wanted to vote for Al Gore (R) accidentally voting for Pat Buchanan (Reform Party). This led to Buchanan receiving an unusually high number of votes in some districts, far more than he did in other similar districts. The unusually high vote total in Palm Beach County was identified as an outlier because it significantly deviates from expected voting patterns or results in other similar areas.