Hi everyone
Sorry in advance if this question is a bit basic.
I want to use pivot_wider, the goal being that the number of resulting columns are equal to the number of rows pivoted, by keeping duplicate values separate.
The question is related to a question I asked a few months ago and that got solved with the help of @JackDavison ,thank you again.
Related topic: https://forum.posit.co/t/widening-a-dataframe-by-a-set-of-columns/124322
My example dataset:
data <- data.frame(Person = c("Peter", "Peter", "Peter", "Peter", "Peter", "Peter",
"Carol", "Carol", "Carol", "Carol", "Carol", "Carol"),
GroupID = c(1, 1, 2, 2, 3, 3, 1, 1, 4, 4, 5, 5),
GroupTheme = c(1, 1, 1, 1, 2, 2, 1, 1, 2, 2, 2, 2),
Committee = c("Transport", "State", "Transport", "State", "Transport", "State",
"Technology", "Nature", "Technology", "Nature", "Technology", "Nature"))
I want to have one row per Person. Note that the observations for "Committee" of a Person repeat for each Group. This is intended and is the case for every "Name" in the original dataset.
The code I have used so far:
widened = function(col, pre){
data %>%
select(Person, {{col}}) %>%
distinct() %>%
with_groups(Person, ~mutate(.x, n = row_number())) %>%
pivot_wider(names_from = n, values_from = {{col}}, names_prefix = pre)
}
data <- reduce(list(widened(GroupID, "GroupID_"),
widened(GroupTheme, "GroupTheme_"),
widened(Committee, "Committee_")),
left_join, by = "Person")
results in the following dataset:
Person GroupID_1 GroupID_2 GroupID_3 GroupTheme_1 GroupTheme_2 Committee_1 Committee_2
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 Peter 1 2 3 1 2 Transport State
2 Carol 1 4 5 1 2 Technology Nature
As you can see, there are 3 columns with GroupID_ but only 2 columns with GroupThemes_, because the maximum number of unique values for GroupTheme_ per Person is 2.
However, i want the dataset to look like this:
Person GroupID_1 GroupID_2 GroupID_3 GroupTheme_1 GroupTheme_2 GroupTheme_3 Committee_1
1 Peter 1 2 3 1 1 2 Transport
2 Carol 1 4 5 1 2 2 Technology
Committee_2
1 State
2 Nature
Duplicate values should not get removed. This allows me to match each GroupID_ to each GroupTheme_ by number, just like it was the case in the original longer dataset.
I played around with the options of pivot_wider but not figure out a way to do this.
If you have an alternative (maybe more straightforward) way to solve the problem of being able to match each ID to Theme after pivoting wider, that is very much appreciated as well.
Thank you in advance