Understanding path analysis: paths, standardized coefficients and causality

infoart.ca
5 min readFeb 8, 2025

--

Path analysis is part of the family of structural equation modeling (SEM) techniques, but it focuses solely on observed variables (not latent variables). It is a statistical technique that extends multivariable regression methods to model the direct and indirect relationships among the variables within a predefined causal framework. In this article, we will explore the basics of path analysis and interpretation techniques, including coefficients, standardization, and causality. We will use example dataset to build a causal model in R.

Path analysis requires that the relationships between variables are specified a priori based on theory or prior evidence.

What is path analysis?

Path analysis is a commonly used method in modelling causal relationships between variables. At its core, this method helps understand how variables interact directly and indirectly within a given model. Path analysis produces path coefficients, which quantify the strength of these relationships. Specifically, it estimate direct effects (relationships between variables connected by a single arrow), assess indirect effects (relationships mediated through intermediate variables), and estimate thus the total effect of one variable on another (direct + indirect effects).

Total effects = direct effects + indirect effects

Path analysis differs from regular regression methods by clearly distinguishing between direct effects (the immediate relationships between variables) and indirect effects (the relationships that are mediated through other variables). Unlike simple correlations, path coefficients are derived from a causal model, reflecting hypothesized cause-and-effect relationships.

What is path in path analysis?

In path analysis, a path refers to the directional relationship between variables in a model, represented by arrows in a path diagram. These paths signify hypothesized causal effects, based on theory or prior research. The single-headed arrows (→) indicate direct effects of one variable on another, while double-headed arrows (↔) indicate correlations or covariances between variables without implying causation.

For example, in a path model:

  • X → Y represents a direct effect of variable X on variable Y.
  • X ↔ Z suggests that X and Z are correlated but do not necessarily have a causal relationship.

What is a path coefficient and how to interprete them?

Path analysis produces path coefficient which are standardized regression coefficient (also called a beta coefficient). A coefficient measures the direct effect of one variable on another in a causal model, while accounting for the influence of other variables in the model. These coefficients are standardized, meaning they are expressed in terms of standard deviations, which allows for comparison across variables with different units of measurement.

Interpreting path coefficients involves evaluating their magnitude, direction, and statistical significance:

  1. Magnitude (strength of the effect):
  • |0.0–0.1|: very weak effect
  • |0.1–0.3|: weak effect
  • |0.3–0.5|: moderate effect
  • |0.5 and above|: strong effect

2. Sign (direction of the effect):

  • Positive (+): A positive path coefficient indicates that an increase in the IV leads to an increase in the DV.
  • Negative (-): A negative path coefficient indicates the opposite- an increase in the IV leads to a decrease in the DV.

3. Statistical Significance:

A p-value that accompanies each coefficient indicates whether the coefficient is statistically significant. A p-value less than 0.05 suggests statistical significance.

Path coefficients tell us how strongly one variable influences another and in which direction.

Let’s use a hypothetical example with the following relationships to better understand the numbers:

Direct Effects:

Education → Income (path coefficient = 0.6)
A 1 SD increase in Education leads to a 0.6 SD increase in Income.

Income → Job Satisfaction (path coefficient = 0.5)
A 1 SD increase in Income leads to a 0.5 SD increase in Job Satisfaction.

Education → Job Satisfaction (path coefficient = 0.3)
A 1 SD increase in Education directly leads to a 0.3 SD increase in Job Satisfaction.

Indirect effects

Indirect Effect = (Education → Income) × (Income → Job Satisfaction)
= 0.6 × 0.5
= 0.3

Total Effect:

The total effect of Education on Job Satisfaction is the sum of the direct 
and indirect effects:

TE = Direct Effect + Indirect Effect
= 0.3 + 0.3
= 0.6

Why Use Standardization in Path Analysis?

Standardization plays a crucial role in path analysis. By converting variables to a common scale (mean = 0, SD= 1), it allows for a direct comparison between different path coefficients. Without standardization, coefficients could be misleading because different variables might have very different units or scales (e.g. income in thousands vs. BMI in kg/m2).

Example analysis in R:

We will perform an example analysis using the NHANES dataset which contains a wealth of data on socioeconomic and health status. Specifically, we will explore how income and housing onditions influence mental health and drug use through the following hypotheses:

H1: Homeownership (HomeOwn) is associated with fewer days of bad mental health (DaysMentHlthBad).

H2: Higher household income (HHIncome) is associated with fewer days of bad mental health (DaysMentHlthBad).

H3: Fewer days of bad mental health (DaysMentHlthBad) are associated with lower likelihood of using hard drugs (HardDrugs).

H4: Homeownership (HomeOwn) is associated with lower likelihood of using hard drugs (HardDrugs).

Here is the complete code we are going to use:

library(lavaan)
library(lavaanExtra)
library(ggplot2)
library(dplyr)

data(NHANES)

#prep the data first:

nhanes_data <- NHANES %>%
filter(!is.na(HHIncome), !is.na(DaysMentHlthBad), !is.na(HomeOwn), !is.na(HardDrugs)) %>%
mutate(HHIncome = as.numeric(factor(HHIncome)),
DaysMentHlthBad = as.numeric(DaysMentHlthBad),
HomeOwn = ifelse(HomeOwn == "Own", 1, 0),
HardDrugs = ifelse(HardDrugs == "Yes", 1, 0))

# Here is the path model are using to meet the objectives of the analysis

model <- 'DaysMentHlthBad ~ HHIncome + HomeOwn
HardDrugs ~ HHIncome + HomeOwn + DaysMentHlthBad'

# Now we can the model using the magical sem function from lavaan:
fit <- sem(model, data = nhanes_data)


# and visualise the model using the nice_lavaanPlot function from lavaanExtra package:
nice_lavaanPlot(model = fit,
node_options = list(shape = "box", fontname = "Helvetica"),
edge_options = c(color = "black"),
coefs = TRUE,
stand = TRUE,
covs = TRUE,
stars = c("regress", "latent", "covs"),
sig = 0.05,
graph_options = c(rankdir = "LR"))


# To print the model summary to get standardized estimates along with the fit statistics
summary(fit, standardized = TRUE, fit.measures = TRUE)

And here is plot the code generates for us:

Let’s now review the results in light of the hypotheses:

H1: Homeownership (HomeOwn) is associated with fewer days of bad mental health (DaysMentHlthBad). H1 is supported by the results, path coefficient is -0.05*** i.e. homeowners experience fewer days of poor mental health compared to non-homeowners.

H2: Higher household income (HHIncome) is associated with fewer days of bad mental health (DaysMentHlthBad). H2 is supported by the results also, path coefficient is -0.11***

H3: Fewer days of bad mental health (DaysMentHlthBad) are associated with lower likelihood of using hard drugs (HardDrugs). H3 is supported by the results also, path coefficient is 0.1*** i.e. those experiencing poor mental health are more likely to turn to drug use.

H4: Homeownership (HomeOwn) is associated with lower likelihood of using hard drugs (HardDrugs). H4 is supported by the results also, path coefficient is -0.08*** i.e. those who own their homes are less likely to engage in drug use.

--

--

infoart.ca
infoart.ca

Written by infoart.ca

Center for Social Capital & Environmental Research | Posts by Bishwajit Ghose, BI consultant and lecturer at the University of Ottawa

No responses yet