Extracting unique strings in mutate failing

I'm trying to assemble a list of files in a data frame column, but trying to keep only the unique values is causing me fits. I tried using lists, and then tried using character strings, and have failed at both. The group_by's are part of the process. (the actual application involves census blocks, census block groups, and sea-level rise raster files)


library(tidyverse)
library(stringr)

foo1 <- tribble(~Grpby, ~Listvar,
               "aa",    "M",
               "aa",    "N",
               "ba",    "M",
               "ca",    "N,O",
               "ca",    "N",
               "cb",    "M"
)

foo2 <- foo1 %>% 
  group_by(Grpby) %>% 
    reframe(Listvar=paste(Listvar, collapse=","),
              Grpby2=str_sub(Grpby, 2,2))

foo3 <- foo2 %>% 
  group_by(Grpby2) %>% 
    summarise(Listvar=first(paste(Listvar, collapse=","))) 
    
foo3

foo4 <- foo3 %>% 
  mutate(Newvar=unique(unlist(str_split(Listvar, pattern=","))))


foo3:

Grpby2   Listvar
<chr>    <chr>
a	     M,N,M,N,M,N,O,N,N,O,N			
b	     M

Then:
Error in `mutate()`:
ℹ In argument: `Newvar = unique(unlist(str_split(Listvar, pattern = ",")))`.
Caused by error:
! `Newvar` must be size 2 or 1, not 3.
Backtrace:
 1. foo3 %>% ...
 9. dplyr:::dplyr_internal_error(...)

I'm not sure what your goal is. Do you want the unique values of Listvar when grouped by the second letter of Grpby and you want them stored as comma-separated values in a single row? I think this does that.

library(tidyverse)

foo1 <- tribble(~Grpby, ~Listvar,
                "aa",    "M",
                "aa",    "N",
                "ba",    "M",
                "ca",    "N,O",
                "ca",    "N",
                "cb",    "M"
)

Cnt <- max(str_count(foo1$Listvar, ","))
foo1 |> mutate(Grpby = str_sub(Grpby, 2,2)) |> 
                 separate(col= "Listvar", into = LETTERS[1:(Cnt+1)], sep = ",", fill = "right") |> 
  pivot_longer(cols = -Grpby) |> 
  na.omit() |> 
  select(-name) |> 
  group_by(Grpby) |> 
  distinct() |> 
  summarize(New = paste(value, collapse = ","))
#> # A tibble: 2 × 2
#>   Grpby New  
#>   <chr> <chr>
#> 1 a     M,N,O
#> 2 b     M

Created on 2024-12-05 with reprex v2.1.1

Maybe I over-simplified for the reprex. Listvar is actually a collection of rather long filenames, and not single letters. Additionally, the successive group_by's are a necessary part of the process - I've stripped out a lot of other stuff going on.

Basically I want foo4 to end up with just a list of unique strings in Listvar.

Is this what you are after? Notice I changed the initial data to add more value in Listvar.

library(tidyverse)
foo1 <- tribble(~Grpby, ~Listvar,
                "aa",    "M",
                "aa",    "N",
                "ba",    "M,P",
                "ca",    "N,O,Q",
                "ca",    "N",
                "cb",    "M"
)

foo2 <- foo1 %>% 
  group_by(Grpby) %>% 
  reframe(Listvar=str_split_1(paste(Listvar, collapse=","), pattern = ",")) |>  
  mutate(Grpby2=str_sub(Grpby, 2,2))

foo3 <- foo2 %>% 
  group_by(Grpby2) %>% 
  summarize(Listvar=paste(unique(Listvar), collapse = ",")) 

foo3
#> # A tibble: 2 × 2
#>   Grpby2 Listvar  
#>   <chr>  <chr>    
#> 1 a      M,N,P,O,Q
#> 2 b      M

Created on 2024-12-05 with reprex v2.1.1

Thank you!! I think that will do it

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.