I want to combine duplicate rows into a one with multiple columns for the unique info. I was able to figure out a couple of ways using the tidyverse, but I'm wondering if there is a better way than what I've come up with.
I could use another set of eyes on this problem.
It is sort of the reverse of what was done in Tidy way to split a column.
data.frame(
id = c(1, 1, 2, 2, 3),
code = c("A", "B", "C", "D", "E")
)
#> id code
#> 1 1 A
#> 2 1 B
#> 3 2 C
#> 4 2 D
#> 5 3 E
library(tidyverse)
(df <- tibble::tibble(
id = c(1, 1, 2, 2, 3, 4, 4),
code = c("A", "B", "C", "D", "E", NA, NA)))
#> # A tibble: 7 x 2
#> id code
#> <dbl> <chr>
#> 1 1 A
#> 2 1 B
#> 3 2 C
#> 4 2 D
#> 5 3 E
#> 6 4 <NA>
#> 7 4 <NA>
(max_dups <- df %>%
dplyr::count(id) %>%
dplyr::pull(n) %>%
max(.))
#> [1] 2
df %>%
group_by(id) %>%
summarise(combo_code = paste0(code, collapse = "~~~~")) %>%
tidyr::separate(.,
col = combo_code,
into = paste("new_code", 1:max_dups, sep = "_"),
sep = "~~~~",
remove = TRUE,
fill = "right") %>%
mutate_at(.vars = vars(paste("new_code", 1:max_dups, sep = "_")),
.funs = funs(dplyr::na_if(., "NA")))
#> # A tibble: 4 x 3
#> id new_code_1 new_code_2
#> <dbl> <chr> <chr>
#> 1 1 A B
#> 2 2 C D
#> 3 3 E <NA>
#> 4 4 <NA> <NA>
Created on 2018-06-25 by the reprex package (v0.2.0).
I figured out a way to do it with tidyr::spread()
but it does't seem as fool-proof.
## Another way ----------------
(df <- tibble::tibble(
id = c(1, 1, 2, 2, 3, 4, 4),
code = c("A", "B", "C", "D", "E", NA, NA)))
#> # A tibble: 7 x 2
#> id code
#> <dbl> <chr>
#> 1 1 A
#> 2 1 B
#> 3 2 C
#> 4 2 D
#> 5 3 E
#> 6 4 <NA>
#> 7 4 <NA>
library(tidyverse)
df %>%
group_by(id) %>%
mutate(dup_count = sequence(dplyr::n()),
key = paste("new_code", dup_count, sep = "_")) %>%
tidyr::spread(.,
key = key,
value = code) %>%
tidyr::fill(dplyr::starts_with("new_code"), .direction = "up") %>%
dplyr::distinct(., id, .keep_all = TRUE)
#> # A tibble: 4 x 4
#> # Groups: id [4]
#> id dup_count new_code_1 new_code_2
#> <dbl> <int> <chr> <chr>
#> 1 1 1 A B
#> 2 2 1 C D
#> 3 3 1 E <NA>
#> 4 4 1 <NA> <NA>
Created on 2018-06-25 by the reprex package (v0.2.0).