Combining and Stacking Columns in R/Python
In our routine data-wrangling activities, we often need to combine or stack columns in a dataset to restructure it for analysis. In this post, I’d like to show how this can be done easily in R using the c()
function and data frames.
Suppose we have a data frame for countries and their corresponding scores in different columns that we want to combine vertically (this is not the same as reshape), and the columns are called: id, country, scores, id1, country1, scores1, id2, country2, scores2.
# create a sample data frame
df <- data.frame(id = c(1, 2, 3, 4, 5),
country = c("Finland", "Ireland", "Norway", "France", "Netherlands"),
scores = c(83.7, 81.7, 80.5, 80.2, 80.1),
id1 = c(39, 40, 41, 41, 43),
country1 = c("South Korea", "Panama", "Malaysia", "Saudi Arabia", "Mexico"),
scores1 = c(70.2, 70, 69.9, 69.9, 69.1),
id2 = c(77, 78, 79, 80, 81),
country2 = c("Egypt", "Cambodia", "Sri Lanka", "Bangladesh", "Laos"),
scores2 = c(56, 55.7, 55.2, 54, 53.1))
# print the df
df
id country scores id1 country1 scores1 id2 country2 scores2
1 1 Finland 83.7 39 South Korea 70.2 77 Egypt 56.0
2 2 Ireland 81.7 40 Panama 70.0 78 Cambodia 55.7
3 3 Norway 80.5 41 Malaysia 69.9 79 Sri Lanka 55.2
4 4 France 80.2 41 Saudi Arabia 69.9 80 Bangladesh 54.0
5 5 Netherlands 80.1 43 Mexico 69.1 81 Laos 53.1
Now, as you can guess already, we want to combine the first, fourth, and seventh column into one id, the second, fifth, and eighth column into one coutry, and the third, sixth, and ninth column into one score column.
Ans here's how we can do it:
df1 <- data.frame(id = c(df$id, df$id1, df$id2),
country = c(df$country, df$country1, df$country2),
scores = c(df$scores, df$scores1, df$scores2))
print(df1)
id country scores
1 1 Finland 83.7
2 2 Ireland 81.7
3 3 Norway 80.5
4 4 France 80.2
5 5 Netherlands 80.1
6 39 South Korea 70.2
7 40 Panama 70.0
8 41 Malaysia 69.9
9 41 Saudi Arabia 69.9
10 43 Mexico 69.1
11 77 Egypt 56.0
12 78 Cambodia 55.7
13 79 Sri Lanka 55.2
14 80 Bangladesh 54.0
15 81 Laos 53.1
And another fancy way of doing the same thing:
# Split the dataframe into 3 smaller dataframes
df1 <- df[, c("id", "country", "scores")]
df2 <- df[, c("id1", "country1", "scores1")]
df3 <- df[, c("id2", "country2", "scores2")]
# Rename the columns
colnames(df2) <- colnames(df1)
colnames(df3) <- colnames(df1)
# Combine the dataframes
final_df <- rbind(df1, df2, df3)
# print the df
> final_df
id country scores
1 1 Finland 83.7
2 2 Ireland 81.7
3 3 Norway 80.5
4 4 France 80.2
5 5 Netherlands 80.1
6 39 South Korea 70.2
7 40 Panama 70.0
8 41 Malaysia 69.9
9 41 Saudi Arabia 69.9
10 43 Mexico 69.1
11 77 Egypt 56.0
12 78 Cambodia 55.7
13 79 Sri Lanka 55.2
14 80 Bangladesh 54.0
15 81 Laos 53.1
And here is the Py equivalent…
# load the library
import pandas as pd
# load the df
df = pd.DataFrame({
'id': [1, 2, 3, 4, 5],
'country': ["Finland", "Ireland", "Norway", "France", "Netherlands"],
'scores': [83.7, 81.7, 80.5, 80.2, 80.1],
'id1': [39, 40, 41, 41, 43],
'country1': ["South Korea", "Panama", "Malaysia", "Saudi Arabia", "Mexico"],
'scores1': [70.2, 70, 69.9, 69.9, 69.1],
'id2': [77, 78, 79, 80, 81],
'country2': ["Egypt", "Cambodia", "Sri Lanka", "Bangladesh", "Laos"],
'scores2': [56, 55.7, 55.2, 54, 53.1]
})
# create the new data frame by combining the required columns
df1 = pd.DataFrame({
'id': list(df['id']) + list(df['id1']) + list(df['id2']),
'country': list(df['country']) + list(df['country1']) + list(df['country2']),
'scores': list(df['scores']) + list(df['scores1']) + list(df['scores2'])
})
### the fancy alternative
# subset the df into 3
df1 = df[['id', 'country', 'scores']]
df2 = df[['id1', 'country1', 'scores1']]
df3 = df[['id2', 'country2', 'scores2']]
# rename the columns and merge
df2.columns = df1.columns
df3.columns = df1.columns
final_df = pd.concat([df1, df2, df3], ignore_index=True)
… to get the same happy output
Happy stacking!