Combining and Stacking Columns in R/Python

3 min readMar 16, 2023

Photo by Leonhard Niederwimmer on Unsplash

In our routine data-wrangling activities, we often need to combine or stack columns in a dataset to restructure it for analysis. In this post, I’d like to show how this can be done easily in R using the c() function and data frames.

Suppose we have a data frame for countries and their corresponding scores in different columns that we want to combine vertically (this is not the same as reshape), and the columns are called: id, country, scores, id1, country1, scores1, id2, country2, scores2.

# create a sample data frame

df <- data.frame(id = c(1, 2, 3, 4, 5),
                 country = c("Finland", "Ireland", "Norway", "France", "Netherlands"),
                 scores = c(83.7, 81.7, 80.5, 80.2, 80.1),
                 id1 = c(39, 40, 41, 41, 43),
                 country1 = c("South Korea", "Panama", "Malaysia", "Saudi Arabia", "Mexico"),
                 scores1 = c(70.2, 70, 69.9, 69.9, 69.1),
                 id2 = c(77, 78, 79, 80, 81),
                 country2 = c("Egypt", "Cambodia", "Sri Lanka", "Bangladesh", "Laos"),
                 scores2 = c(56, 55.7, 55.2, 54, 53.1))


# print the df

df

  id     country scores id1     country1 scores1 id2   country2 scores2
1  1     Finland   83.7  39  South Korea    70.2  77      Egypt    56.0
2  2     Ireland   81.7  40       Panama    70.0  78   Cambodia    55.7
3  3      Norway   80.5  41     Malaysia    69.9  79  Sri Lanka    55.2
4  4      France   80.2  41 Saudi Arabia    69.9  80 Bangladesh    54.0
5  5 Netherlands   80.1  43       Mexico    69.1  81       Laos    53.1

Now, as you can guess already, we want to combine the first, fourth, and seventh column into one id, the second, fifth, and eighth column into one coutry, and the third, sixth, and ninth column into one score column.

Ans here's how we can do it:


df1 <- data.frame(id = c(df$id, df$id1, df$id2),
                  country = c(df$country, df$country1, df$country2),
                  scores = c(df$scores, df$scores1, df$scores2))


print(df1)

   id      country scores
1   1      Finland   83.7
2   2      Ireland   81.7
3   3       Norway   80.5
4   4       France   80.2
5   5  Netherlands   80.1
6  39  South Korea   70.2
7  40       Panama   70.0
8  41     Malaysia   69.9
9  41 Saudi Arabia   69.9
10 43       Mexico   69.1
11 77        Egypt   56.0
12 78     Cambodia   55.7
13 79    Sri Lanka   55.2
14 80   Bangladesh   54.0
15 81         Laos   53.1

And another fancy way of doing the same thing:

# Split the dataframe into 3 smaller dataframes
df1 <- df[, c("id", "country", "scores")]
df2 <- df[, c("id1", "country1", "scores1")]
df3 <- df[, c("id2", "country2", "scores2")]

# Rename the columns
colnames(df2) <- colnames(df1)
colnames(df3) <- colnames(df1)

# Combine the dataframes
final_df <- rbind(df1, df2, df3)

# print the df

> final_df
   id      country scores
1   1      Finland   83.7
2   2      Ireland   81.7
3   3       Norway   80.5
4   4       France   80.2
5   5  Netherlands   80.1
6  39  South Korea   70.2
7  40       Panama   70.0
8  41     Malaysia   69.9
9  41 Saudi Arabia   69.9
10 43       Mexico   69.1
11 77        Egypt   56.0
12 78     Cambodia   55.7
13 79    Sri Lanka   55.2
14 80   Bangladesh   54.0
15 81         Laos   53.1

And here is the Py equivalent…

# load the library
import pandas as pd

# load the df
df = pd.DataFrame({
'id': [1, 2, 3, 4, 5],
'country': ["Finland", "Ireland", "Norway", "France", "Netherlands"],
'scores': [83.7, 81.7, 80.5, 80.2, 80.1],
'id1': [39, 40, 41, 41, 43],
'country1': ["South Korea", "Panama", "Malaysia", "Saudi Arabia", "Mexico"],
'scores1': [70.2, 70, 69.9, 69.9, 69.1],
'id2': [77, 78, 79, 80, 81],
'country2': ["Egypt", "Cambodia", "Sri Lanka", "Bangladesh", "Laos"],
'scores2': [56, 55.7, 55.2, 54, 53.1]
})

# create the new data frame by combining the required columns
df1 = pd.DataFrame({
    'id': list(df['id']) + list(df['id1']) + list(df['id2']),
    'country': list(df['country']) + list(df['country1']) + list(df['country2']),
    'scores': list(df['scores']) + list(df['scores1']) + list(df['scores2'])
})


### the fancy alternative

# subset the df into 3
df1 = df[['id', 'country', 'scores']]
df2 = df[['id1', 'country1', 'scores1']]
df3 = df[['id2', 'country2', 'scores2']]

# rename the columns and merge
df2.columns = df1.columns
df3.columns = df1.columns
final_df = pd.concat([df1, df2, df3], ignore_index=True)

… to get the same happy output

Happy stacking!

Combining and Stacking Columns in R/Python

Written by infoart.ca

No responses yet