9.2 Conclusion

In this chapter, we learned how to use logistic regression to predict a binary outcome. The key distinction is that the outcome variable in logistic regression can be binomial (e.g., yes/no, true/false), whereas the outcome variable in linear regression must also be metric (and should be regularly distributed). Estimates of each coefficient on our predictor variables are part of the model’s outputs. A one-unit change in the predictor impacts the log-odds of a state change in the outcome variable according to the coefficient when these are expressed as “log-odds”. We can investigate the null hypothesis that any given coefficient is indeed 0 in the population by applying the conventional methods of significance testing (in its logodds representation). To determine whether the reduction in error was significant, we can also compare the chi-square value of a null model to the chi-square value of a model that contains one or more predictors. The assumption under consideration is that the model was not made better by including the predictor or predictors. We reject the null hypothesis if there is a substantial shift in the chi-square.

We also learned how to use Bayesian estimation to estimate the posterior distribution of the parameters in a logistic regression model. We also learned how to use the posterior distribution to estimate the odds ratio for each predictor. We can produce a large number of alternative solutions that create a posterior distribution for each coefficient by employing the Markov chain Monte Carlo (MCMC) approach.

The chi-square value or the AIC of a particular model can both be used to measure the “quality of fit” in logistic regression. Chi-square and AIC can be used to evaluate nonnested models and nested models (where one model is a subset of another model).