Getting continents from country names in R
In data analysis, it is often necessary to work with country-level data and analyze it based on continents, which is a strenuous task. In this blog post, we will explore the process of getting continents from country names using the countrycode
package.
Here are the steps:
- I got the data on rice production of countries from FAO:
https://www.fao.org/faostat/en/#data/QCL
head(df)
Country Year Value
1 Afghanistan 1961 210000
2 Afghanistan 1962 210000
3 Afghanistan 1963 210000
4 Afghanistan 1964 220000
5 Afghanistan 1965 220000
6 Afghanistan 1966 222000
2. Getting Continents:
library(dplyr)
library(countrycode)
df$Continent <- countrycode(sourcevar = df$Country, origin = "country.name", destination = "continent")
Warning message:
Some values were not matched unambiguously: Yugoslav SFR
# print the df
> head(df)
country year value Continent
1 Afghanistan 1961 210000 Asia
2 Afghanistan 1962 210000 Asia
3 Afghanistan 1963 210000 Asia
4 Afghanistan 1964 220000 Asia
5 Afghanistan 1965 220000 Asia
6 Afghanistan 1966 222000 Asia
The only issue with this method is the Warning message that some country names do not match with the source, which leaves the rows empty and takes another line of manual coding.
df$Continent <- ifelse(dd$country == "Yugoslav SFR", "Europe", dd$Continent)
I hope this will be of help to some. Please let me know if you have a better method to do this!