Estimating predicted probabilities using Stata

infoart.ca
2 min readJul 31, 2024

--

In statistical analysis, particularly with logistic regression models, understanding and interpreting the predicted probabilities can provide valuable insights. Here, we’ll walk through the process of estimating predicted probabilities using Stata.

What are Predicted Probabilities?

Predicted probabilities in the context of logistic regression represent the likelihood of the outcome variable being 1 (or a specific category in multinomial logistic regression) given the values of the predictor variables. Unlike coefficients, which indicate the change in the log-odds of the dependent variable, predicted probabilities provide a more intuitive understanding of the model’s predictions.

Example: Logistic Regression in Stata

Let’s consider a logistic regression example where we want to predict whether an individual lives in an urban area (urban) based on their sex (sex) and education level (edu). Here's how you can estimate predicted probabilities in Stata:

Step 1 is to run the logistic regression model:

. logit urban i.sex i.education

Iteration 0: Log likelihood = -59821.57
Iteration 1: Log likelihood = -57495.43
Iteration 2: Log likelihood = -57492.346
Iteration 3: Log likelihood = -57492.346

Logistic regression Number of obs = 86,305
LR chi2(5) = 4658.45
Prob > chi2 = 0.0000
Log likelihood = -57492.346 Pseudo R2 = 0.0389

----------------------------------------------------------------------------------------------
urban | Coefficient Std. err. z P>|z| [95% conf. interval]
-----------------------------+----------------------------------------------------------------
sex |
Female | .1669514 .0141443 11.80 0.000 .139229 .1946738
|
education |
Secondary - 3 year Tertia.. | .8005236 .0151371 52.88 0.000 .7708554 .8301918
Completed four years of e.. | 1.388151 .0247453 56.10 0.000 1.339651 1.43665
(DK) | .5199512 .1313884 3.96 0.000 .2624347 .7774677
(RF) | 1.299867 .1838019 7.07 0.000 .9396221 1.660112
|
_cons | -.6605288 .0145279 -45.47 0.000 -.689003 -.6320546
----------------------------------------------------------------------------------------------

Next, we apply the margins method to get the predicted probabilities:


. margins sex education, atmeans post

Adjusted predictions Number of obs = 86,305
Model VCE: OIM

Expression: Pr(urban), predict()
At: 1.sex = .4360118 (mean)
2.sex = .5639882 (mean)
1.edu = .3681247 (mean)
2.edu = .5125659 (mean)
3.edu = .1150223 (mean)
4.edu = .0027113 (mean)
5.edu = .0015758 (mean)

------------------------------------------------------------------------------
| Delta-method
| Margin std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
sex |
Male | .4782434 .0026487 180.56 0.000 .473052 .4834348
Female | .5199569 .0023297 223.19 0.000 .5153908 .5245229
|
education |
Complete..) | .3620748 .0027003 134.09 0.000 .3567824 .3673672
Secondary.. | .5582723 .0023645 236.11 0.000 .5536381 .5629066
Completed.. | .6946141 .0046234 150.24 0.000 .6855524 .7036758
(DK) | .4883973 .0327005 14.94 0.000 .4243055 .5524892
(RF) | .6755722 .0402033 16.80 0.000 .5967752 .7543693
------------------------------------------------------------------------------

The margins command provides the average predicted probabilities for different categories of sex and education, holding other variables at their mean values.

As we can see, the predicted probability of living in an urban area is approximately 47.8% for men compared to 52.0% in women. Similarly, the predicted probability of living in an urban area is approximately 36.2% among those with Complet “completed primary education” compared to 69.5% among those with Completd “completed secondary education”.

--

--

infoart.ca
infoart.ca

Written by infoart.ca

Center for Social Capital & Environmental Research | Posts by Bishwajit Ghose, BI consultant and lecturer at the University of Ottawa

No responses yet