amare
August 1, 2022, 10:44am
1
Hi everyone, just one question that took my time so long.
I have a data frame in the code below and would like to keep only unduplicated rows based on column char . However, I do not want to lose the information in the column x and y as I remove duplicates.
L3 <- LETTERS[1:3]
char <- sample(L3, 10, replace = TRUE)
print( data.frame(x = rep(c("w","z","x","g","h"), c(2,2,2,3,1)), y = 1:10, char = char))
Is there any possible way to restructure the data frame while removing duplicates and keeping column information corresponding to duplicate items?
I want to get a data frame as below
char x y
1 A x,g 5,6
2 B w,z,x,g 1,2,3,4,5,6,7,8,9
3 C z,n 3,4
Best,
Amare
amare:
L3 <- LETTERS[1:3]
char <- sample(L3, 10, replace = TRUE)
print( data.frame(x = rep(c("w","z","x","g","h"), c(2,2,2,3,1)), y = 1:10, char = char))
A tidyverse solution would be:
library(tidyverse)
L3 <- LETTERS[1:3]
char <- sample(L3, 10, replace = TRUE)
df <- data.frame(x = rep(c("w","z","x","g","h"), c(2,2,2,3,1)), y = 1:10, char = char)
df |>
group_by(char) |>
summarise(across(c(x, y), ~ str_flatten(unique(.x), collapse = ", ")))
#> # A tibble: 3 × 3
#> char x y
#> <chr> <chr> <chr>
#> 1 A w, z, g 2, 4, 7, 8
#> 2 B w, g, h 1, 9, 10
#> 3 C z, x 3, 5, 6
Created on 2022-08-01 by the reprex package (v2.0.1)
1 Like
Consider nesting your data as an alternative
(mydata <-data.frame(
stringsAsFactors = FALSE,
x = c("w", "w", "z", "z", "x", "x", "g", "g", "g", "h"),
y = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L),
char = c("C", "A", "C", "A", "A", "B", "C", "C", "B", "B")
))
library(tidyverse)
(mydata_nested <- nest(mydata,data=c(x,y)))
mydata_nested$data
This approach is now more convenient than ever as there is a new nplyr package on CRAN.
A Grammar of Nested Data Manipulation • nplyr (markjrieke.github.io)
1 Like
I have have misunderstood but I find my function SortedUniqueList useful in situations like this
SortedUniqueList <- function(vectorin, sep = "/") {
paste(unique(sort(vectorin, na.last = TRUE)),collapse=sep)
}
outdata <- mydata %>%
group_by(char) %>%
summarise(x = SortedUniqueList(x, sep = ","), y = SortedUniqueList(y, sep = ","))
system
Closed
August 9, 2022, 1:50pm
5
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.