Extracting World Bank data in Stata

3 min readDec 29, 2022

Stata is one of the most popular programs used in data science and analytics. It is a powerful statistical software that was originally developed at StataCorp in 1985. Since then, it has become a widely accepted program for individuals and organizations due to its ability to analyze large datasets.

Stata’s graphical user interface (GUI) allows users to navigate through complex tasks relatively easily, allowing for more efficient manipulation and visualizations of data. Additionally, it features an extensive library of user-written packages that further enhances its capabilities that
are not available in the base installation of the software.

In this post, I’d like to give a brief sketch of a package to extract data from the World Bank data repo, of which I happen to be a great fan!

The package can be installed using the following command:

ssc install wbopendata

An excellent manual is available for starters through the help command:

help wbopendata

Now, to load the GUI, just type

db wbopendata

A very neat and self-explanatory interface to work with, but I generally write the command myself:

wbopendata, indicator(indicator1; indicator2; ...) long clear

The ‘long’ element is optional, but the ‘clear’ one is not, as the package requires starting with an empty dataset.

The database features hundreds of indicators, each with a unique ID or code. Upon entering a search term, the corresponding code appears on the address bar following https://data.worldbank.org/indicator/:

Once the codes are recorded, the data can be requested using the following command:

wbopendata, indicator(SP.DYN.LE00.FE.IN; SP.DYN.LE00.MA.IN; SP.DYN.LE00.IN; SI.POV.DDAY) long clear

The variables include life expectancy (female), life expectancy (male), life expectancy (overall), and Poverty headcount ratio at $2.15 a day (2017 PPP) (% of population).

Let’s do some exploration of the variables. Right now, the country variable seems to be in string format. We can encode it to numeric and generate an ID variable with that:

encode countryname, g(country)
egen id = group(country), label

Success! Now we’ll define the dataset as time series with year as the time variable-

xtset id year

which will allow plotting life expectancies among men and women over the time period:

twoway (scatter sp_dyn_le00_fe_in y) (scatter sp_dyn_le00_ma_in y)

Looks a bit messy. Let’s rename the variables and make a multicounty time series chart:

xtline LE_ALL if inlist(id, 1, 12, 34, 32, 35, 45), overlay

To conclude, we will limit the chart to the first country in the database only, and of course, use a cute colour scheme to give the chart a proper look:

twoway (scatter LE_Female y) (scatter LE_Male y) if inlist(id, 1) , sch(plottig) xlabel(1960(5)2021)

That’s a really quick tutorial of this really mighty package that I hope you enjoyed, and I shall be back with more soon!

Extracting World Bank data in Stata

Written by infoart.ca

No responses yet