Models for Probit, Logit, and LPM prediction with data generated to show their relative merits
The LPM model follows the closed form OLS method while Probit and Logit minimize their respective log likelihood functions to get model coefficients. Data was generated with a logistic and normal error to test the different performance of the models.
Paper by: Nate Tollet, Abe Burton, Nicholas Oxendon
Code By: Abe Burton, Nicholas Oxendon
Binary prediction is a common econometric tool that can be done in several ways. It is very useful in a variety of scenarios where an outcome’s probability can be turned into a prediction of an outcome. Several methods have been developed to deal with this type of question, each with its own strengths, weaknesses, and relevant assumptions. Our purpose is to build the most common of these tools and test their performance with data that challenges their assumptions.
One possible way to handle a binary dependent variable is through a linear probability model. This is done by essentially running an OLS regression,
An effective way to address the issues of Linear Probability ability estimation is by using MLE estimation. MLE is a type of M-estimator, where we have some likelihood function representing the joint distribution of the observed data and our parameters. For computational reasons, we often use the log-likelihood function,
Similarly to other M-estimators, our MLE estimator is consistent and asymptotically normal. To perform MLE estimation, we need to make an assumption about the underlying distribution of our data. We will look at two options, Probit and Logit estimation.
Probit estimation assumes that our distribution
Logit estimation works similar to Probit, but instead of the standard normal CDF, our underlying distribution is assumed to be
The linear probability model has been the subject of some criticism among researchers concerning its usefulness in calculating binary outcomes in RCTs. A similar question exists for its usefulness in quasi-experimental data, which in reality is the majority of what is available to estimate causality. John Deke from the HHS explains in a brief why the linear probability model is appropriate for both of these situations. The linear probability model, of course, is the use of binary outcomes in the place of continuous ones in OLS. The estimate can be interpreted as a probability. Deke notes that while logistic regression (logit) and the linear probability model both yield the same average treatment effect, there are certain advantages to the generality of the linear probability model, and he illustrates this with a sample regression of predicted probabilities. The estimates given by the linear probability model can be directly interpreted as the mean marginal effect, while those of the logistic model cannot. This gives the advantage of being more intuitively interpretable, giving estimates that can be easily shared and understood by a more broad audience. The drawback to this is that it misses the true non-linear relationship of a binary outcome and a continuous covariate. There then remains the possibility that the linear probability model can regurgitate nonsense predictions of less than 0 percent or more than 100 percent.
The empirical evidence as given by Deke suggests that perhaps we should not be overly concerned with the linear probability model giving nonsense estimates. Using Monte Carlo simulations, he shows that the linear probability model performs as well or better in most scenarios than logistic regression. He summarizes these in 4 key findings- first, if treatment perfectly predicts outcome, logistic regression will fail to appropriately estimate the impact, while the linear probability can. Second, he found that the linear probability model faced issues of bias in far fewer cases than logistic regression. Third, and a disadvantage for the linear probability model, is that logistic regression is typically more precise. Finally, and perhaps a reason to disregard the disadvantage presented in the third finding, is that the standard error given by the linear probability model is far more often correct than the one given by logistic regression, whose standard error is often too small. Linear probability models seem to sacrifice some specificity for interpretability. Logistic regression, or logit, could be considered a sister model to the linear probability model. Both are interpreted similarly, but logit provides a better approximation of the true nonlinear form that a probabilistic function necessarily takes. Unlike the linear probability model, we will not see predicted probabilities below 0 percent or above 100 percent. And, perhaps an advantage over probit regression, logit tails are fatter, approaching the limits of 0 and 100 more slowly and providing room for the possibility of more extreme values. Stone and Rasp (1991) use accounting choice studies in gathering evidence on the utility of logit regression. Their key findings show that even in sample sizes as small as 50 observations, logit may still be preferable to OLS, although, with such a small sample, the results are likely to be biased. Thus, even in a small sample, logit may outperform an OLS estimate on a binary outcome variable.
This confirms some preconceptions while also setting some bounds on its usefulness. In conjunction with Noreen (1988), they conclude that the assumption that researchers have often made that the sample size is sufficient for logit is often not true. After running 10,000 replications of a model with one predictor, they conclude that at least 50 observations may be needed for logit to outperform OLS, and 100 observations when there is skewness in the predictor. This jumps to 200 observations in a model with skewness and multiple predictors. In cases where the sample size does not exceed these bounds, OLS test statistics might be better calibrated, but we may still be willing to sacrifice some accuracy for flexibility. Even in this case where logit seems like it may be outperformed, logit is likely to lead to lower misclassification rates, less meaningless predictions, and more powerful tests of parameter estimates. Researchers, therefore, find that logit, while biased in small samples, is a better classifier and provides more useful estimates than OLS. We see fewer nonsense predictions than in the linear probability model, and it is computationally easier than probit. Its accuracy and test power increase with sample size. Logit may be the estimation method of choice for research with dichotomous outcomes.
Although logit and probit are nearly identical in their estimates, Noreen (1988) notes that probit has been the method of choice for researchers dealing with dichotomous outcomes, despite its computational inefficiency. Noreen’s results on probit regression are nearly identical to logit regression in comparison with OLS. Using 1000 Monte Carlo trials, he found that OLS seemingly performed better on small sample sizes (<50) than probit. He concludes that in these scenarios, the evidence does not support the use of probit over OLS. Like logit, probit increases in accuracy and test power as sample size increases, but it seems that logit tends to be more useful than probit in small sample sizes. Why then is probit so often utilized, when it is computationally more difficult than logit and less powerful than OLS? There are a few reasons noted in the literature. First, the marginal effects in probit are more intuitive than logit or OLS because it is based on a normal distribution. The linear probability model may be more intuitive, but we may still prefer probit for its nonlinearity and accuracy. Second, because probit approaches the bounds more slowly than logit, it may reduce the effect of outliers on our estimates. If we are concerned that extreme cases in the tails of the distribution may skew our inference, we may prefer probit to logit. Probit, although widely used by researchers in the accounting field as noted by Noreen, may only be more useful than logit or the linear probability model in the specified cases above. Otherwise, we may prefer logit when nonlinearity is more important, or the linear probability model if we seek intuitiveness.
To further test the properties and performance of the linear probability, logit, and probit models, we wrote linear probability, logit, and probit models and estimated models for data generated using a modified version of Julian Winkel and Tobias Krebs’ Python package Opossum.
For the linear probability model, we used the classic closed-form solution
Our models for logit, probit, and LPM performed well on the data that was generated. We tested the validity of the models we built by comparing their results to the results of their respective models in the SKLEARN package in Python. Our coefficients and mean squared error matched the results of those models which are from a commonly used package. The general performance of our models was encouraging, and the results of testing their assumptions on the data were for the most part consistent with what we expected.
Model / Data Type | Logit Model Performance | Probit Model Performance | LPM Model Performance |
---|---|---|---|
Data with normal error | R-squared: 0.53 β: [-.08, -.18, -.17] |
R-squared: 0.51 β: [.48, -.03, -.03] |
R-squared: 0.50 β: [.48, -.03, -.03] |
Data with logistic error | R-squared: 0.525 β: [.62, -.059, .019] |
R-squared: 0.502 β: [.64, -.01, .004] |
R-squared: -0.523 β: [.64, -.01, .004] |
Consistent with the theory, logistic regression performed the best of the three when the data had a logistic error term. Probit and LPM gave similar results to each other and both were worse than logit. Probit did better on the data with the normal error term than it did with logit, which also is what we would have expected in the beginning. However, logit still outperformed both other models on the data with normal error where we would have thought probit would do better. This could be due to the phenomena identified by Noreen that logit does better with smaller sample sizes than probit. For the most part, these results conform to our expectations of how the models should perform based on the theory they were based on.
By following an MLE and OLS framework we were able to replicate results from models that are commonly used by the Python community. Our results supported the theory that the distribution of the error term is an important factor in the model performance of logit and probit. Even though logit did better in all cases, each model performed its individual best when matched to its theoretical best error. Further research could be done to test the circumstances that lead to logit’s high performance in this simulation. LPM is useful for its computational ease and interpretability but is not the best model because of its potential for nonsensical output and its slight disadvantage in predictive power in some cases. Our results are for the most part an encouraging affirmation of the theory that inspired this simulation research from the beginning.
Deke, John. “Using the Linear Probability Model to Estimate Impacts on Binary Outcomes in Randomized Controlled Trials.” Mathematica, https://mathematica.org/publications/using-the-linear-probability-model-to-estimate-impacts-on-binary-outcomes-in-randomized.
Noreen, Eric. “An Empirical Comparison of Probit and OLS Regression Hypothesis Tests.” Journal of Accounting Research, vol. 26, no. 1, 1988, p. 119., https://doi.org/10.2307/2491116.
Rosett, Richard N., and Forrest D. Nelson. “Estimation of the Two-Limit Probit Regression Model.” Econometrica, vol. 43, no. 1, 1975, p. 141., https://doi.org/10.2307/1913419.
Stone, Mary, and John Rasp. Tradeoffs in the Choice between Logit and OLS for Accounting Choice Studies. American Accounting Association, Jan. 1991, https://www.jstor.org/stable/247712.