Creating a Population Pyramid in R
Population pyramids are often used in demography, public health, and social sciences to visualize the age and sex distribution of a population. In this tutorial, we will learn how to create a population pyramid in R using ggplot2
.
We need to load the ggplot2 package and create the dataset.
library(ggplot2)
df <- data.frame(
age_groups = factor(c("15-19", "20-24", "25-29", "30-34", "35-39", "40-44", "45-49", "50-54"), levels = c("50-54", "45-49", "40-44", "35-39", "30-34", "25-29", "20-24", "15-19")),
sex = factor(rep(c("Men", "Women"), each = 8)),
count = c(21, 22, 39, 44, 81, 77, 103, 92, -41, -139, -198, -209, -249, -253, -235, 0)
)
Note that we have made the counts for women negative. This is because in a population pyramid, the bars for men extend to the right, while the bars for women extend to the left. Making the counts for women negative allows us to achieve this effect.
Plot the Data
# plot the data
p <- ggplot(df, aes(x = age_groups, y = count, fill = sex)) +
geom_bar(stat = "identity", width = 0.9) +
coord_flip() +
scale_y_continuous(labels = abs) +
labs(x = "Age Group", y = "Count", fill = "Sex",
title = "Population Pyramid of Hypertension by Age and Sex") +
theme_minimal()
print(p)
Customizing the Population Pyramid
We can change the colors of the bars using the scale_fill_manual
function. Here, we will change the color for men to blue and the color for women to pink:
p + scale_fill_manual(values = c("blue", "pink"))
To make the chart more informative, we can add data labels to the bars. This can be done using the geom_text function:
Changing the color scheme:
p + theme(plot.background = element_rect(fill = "lightgrey"),
panel.grid.major = element_line(color = "white"),
panel.grid.minor = element_line(color = "white"))