Pivot_wider without removing duplicates

Espanino · May 19, 2022, 11:17am

Hi everyone

Sorry in advance if this question is a bit basic.
I want to use pivot_wider, the goal being that the number of resulting columns are equal to the number of rows pivoted, by keeping duplicate values separate.

The question is related to a question I asked a few months ago and that got solved with the help of @JackDavison ,thank you again.
Related topic: https://forum.posit.co/t/widening-a-dataframe-by-a-set-of-columns/124322

My example dataset:

data <- data.frame(Person = c("Peter", "Peter", "Peter", "Peter", "Peter", "Peter",
                              "Carol", "Carol", "Carol", "Carol", "Carol", "Carol"),
                  GroupID = c(1, 1, 2, 2, 3, 3, 1, 1, 4, 4, 5, 5),
                  GroupTheme = c(1, 1, 1, 1, 2, 2, 1, 1, 2, 2, 2, 2),
                  Committee = c("Transport", "State", "Transport", "State", "Transport", "State",
                                "Technology", "Nature", "Technology", "Nature", "Technology", "Nature"))

I want to have one row per Person. Note that the observations for "Committee" of a Person repeat for each Group. This is intended and is the case for every "Name" in the original dataset.

The code I have used so far:

widened = function(col, pre){
  data %>%
    select(Person, {{col}}) %>% 
    distinct() %>%
    with_groups(Person, ~mutate(.x, n = row_number())) %>% 
    pivot_wider(names_from = n, values_from = {{col}}, names_prefix = pre)
}

data <- reduce(list(widened(GroupID, "GroupID_"),
            widened(GroupTheme, "GroupTheme_"),
            widened(Committee, "Committee_")), 
       left_join, by = "Person")

results in the following dataset:

 Person GroupID_1 GroupID_2 GroupID_3 GroupTheme_1 GroupTheme_2 Committee_1 Committee_2
  <chr>      <dbl>     <dbl>     <dbl>        <dbl>        <dbl> <chr>       <chr>      
1 Peter          1         2         3            1            2 Transport   State      
2 Carol          1         4         5            1            2 Technology  Nature

As you can see, there are 3 columns with GroupID_ but only 2 columns with GroupThemes_, because the maximum number of unique values for GroupTheme_ per Person is 2.

However, i want the dataset to look like this:

Person GroupID_1 GroupID_2 GroupID_3 GroupTheme_1 GroupTheme_2 GroupTheme_3 Committee_1
1  Peter         1         2         3            1            1            2   Transport
2  Carol         1         4         5            1            2            2  Technology
  Committee_2
1       State
2      Nature

Duplicate values should not get removed. This allows me to match each GroupID_ to each GroupTheme_ by number, just like it was the case in the original longer dataset.

I played around with the options of pivot_wider but not figure out a way to do this.

If you have an alternative (maybe more straightforward) way to solve the problem of being able to match each ID to Theme after pivoting wider, that is very much appreciated as well.

Thank you in advance

Espanino · May 28, 2022, 10:22am

The solution can be found here:

system · June 4, 2022, 10:23am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.