Plotting global and regional urbanization trends in R
Urbanization, the process of population concentration in urban areas, is a critical demographic trend with significant implications for development and sustainability. In this tutorial, we will visualize urbanization trends globally and regionally using R.
We will use the wbstats
package to extract World Bank data on the percentage of total population living in urban areas (SP.URB.TOTL.IN.ZS
). We'll also merge this data with information on country regions for regional analysis, all in one line code! You can use this code to make a plot of any WDI indicator:
First, let’s load the libraries
library(dplyr)
library(wbstats)
library(ggplot2)
library(rio)
library(tidyr)
And now the 1-line code to make the plot:
wbstats::wb_data(indicator = c('SP.URB.TOTL.IN.ZS')) %>%
select(year = date, value = SP.URB.TOTL.IN.ZS, country = country) %>%
inner_join(read.csv('https://raw.githubusercontent.com/datalake101/continents/main/continent.csv'), by = 'country') %>%
group_by(year, region) %>%
summarise(mean_value = mean(value, na.rm = TRUE), .groups = 'drop') %>%
bind_rows(
group_by(., year) %>%
summarise(mean_value = mean(mean_value, na.rm = TRUE), .groups = 'drop') %>% mutate(region = "Global"))%>%
ggplot( aes(x = year, y = mean_value, group = region, color = region)) +
geom_line() +
scale_color_manual(values = c("Global" = "black", setNames(rainbow(length(unique(combined_data$region)) - 1), unique(combined_data$region)[unique(combined_data$region) != "Global"]))) +
labs(title = "Mean Value by Year and Region with Global Mean",
x = "Year",
y = "Mean Value",
color = "Region") + geom_point() +
theme(axis.text.x = element_text(angle = 90, size = 14),
axis.text.y = element_text(size = 14),
legend.text = element_text(size = 12),
legend.title = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
panel.border = element_blank(),
legend.background = element_blank(),
legend.key = element_blank(),
legend.position = "bottom")+
labs(y='', x='', title = 'Urbanization rate (% population)') +
scale_x_continuous(breaks = unique(df$year)) + ylim(0,100)
For the year 2023, we can visualize the individual countries with the highest rates colored by region:
df %>% filter(year==max(year)) %>%
filter(region != "Aggregates") %>%
top_n(-20, value) %>%
mutate(country = forcats::fct_reorder(country, value)) %>%
ggpubr::ggdotchart(x = "country", y = "value",
color = "region",
palette = c("#BAE4D0", "#51C9C2", "#FC4E07", '#FFB32C', '#FF5E43', '#9557FF', '#7E544D'),
sorting = "descending",
add = "segments",
rotate = TRUE,
group = "region",
dot.size = 6,
font.label = list(color = "black", size = 9, vjust = .7, hjust = -.5),
ggtheme = theme_pubr()) +
labs(y = '', x = '') +
scale_y_continuous(expand = expansion(mult = c(0, 0.3))) +
theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.ticks.y = element_blank(),
legend.title = element_blank(),
legend.position = 'bottom') +
geom_text(aes(label = round(pp, 0)), hjust = -0.5, size = 3)