Estimating predicted probabilities using Stata
In statistical analysis, particularly with logistic regression models, understanding and interpreting the predicted probabilities can provide valuable insights. Here, we’ll walk through the process of estimating predicted probabilities using Stata.
What are Predicted Probabilities?
Predicted probabilities in the context of logistic regression represent the likelihood of the outcome variable being 1 (or a specific category in multinomial logistic regression) given the values of the predictor variables. Unlike coefficients, which indicate the change in the log-odds of the dependent variable, predicted probabilities provide a more intuitive understanding of the model’s predictions.
Example: Logistic Regression in Stata
Let’s consider a logistic regression example where we want to predict whether an individual lives in an urban area (urban
) based on their sex (sex
) and education level (edu
). Here's how you can estimate predicted probabilities in Stata:
Step 1 is to run the logistic regression model:
. logit urban i.sex i.education
Iteration 0: Log likelihood = -59821.57
Iteration 1: Log likelihood = -57495.43
Iteration 2: Log likelihood = -57492.346
Iteration 3: Log likelihood = -57492.346
Logistic regression Number of obs = 86,305
LR chi2(5) = 4658.45
Prob > chi2 = 0.0000
Log likelihood = -57492.346 Pseudo R2 = 0.0389
----------------------------------------------------------------------------------------------
urban | Coefficient Std. err. z P>|z| [95% conf. interval]
-----------------------------+----------------------------------------------------------------
sex |
Female | .1669514 .0141443 11.80 0.000 .139229 .1946738
|
education |
Secondary - 3 year Tertia.. | .8005236 .0151371 52.88 0.000 .7708554 .8301918
Completed four years of e.. | 1.388151 .0247453 56.10 0.000 1.339651 1.43665
(DK) | .5199512 .1313884 3.96 0.000 .2624347 .7774677
(RF) | 1.299867 .1838019 7.07 0.000 .9396221 1.660112
|
_cons | -.6605288 .0145279 -45.47 0.000 -.689003 -.6320546
----------------------------------------------------------------------------------------------
Next, we apply the margins method to get the predicted probabilities:
. margins sex education, atmeans post
Adjusted predictions Number of obs = 86,305
Model VCE: OIM
Expression: Pr(urban), predict()
At: 1.sex = .4360118 (mean)
2.sex = .5639882 (mean)
1.edu = .3681247 (mean)
2.edu = .5125659 (mean)
3.edu = .1150223 (mean)
4.edu = .0027113 (mean)
5.edu = .0015758 (mean)
------------------------------------------------------------------------------
| Delta-method
| Margin std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
sex |
Male | .4782434 .0026487 180.56 0.000 .473052 .4834348
Female | .5199569 .0023297 223.19 0.000 .5153908 .5245229
|
education |
Complete..) | .3620748 .0027003 134.09 0.000 .3567824 .3673672
Secondary.. | .5582723 .0023645 236.11 0.000 .5536381 .5629066
Completed.. | .6946141 .0046234 150.24 0.000 .6855524 .7036758
(DK) | .4883973 .0327005 14.94 0.000 .4243055 .5524892
(RF) | .6755722 .0402033 16.80 0.000 .5967752 .7543693
------------------------------------------------------------------------------
The margins
command provides the average predicted probabilities for different categories of sex
and education
, holding other variables at their mean values.
As we can see, the predicted probability of living in an urban area is approximately 47.8% for men compared to 52.0% in women. Similarly, the predicted probability of living in an urban area is approximately 36.2% among those with Complet
“completed primary education” compared to 69.5% among those with Completd
“completed secondary education”.