select columns with different values grouped by one

juandmaz · August 10, 2023, 6:52pm

I have this df

df<- data.frame(
  document= c("A", "B", "A", "B"),
  column1 = c(1, 2, 3, 4),
  column2 = c(3, 3, 4, 4),
  column3 = c(5, 6, 5, 6),
  column4= c(1,1,1,1),
  column5= c(1,2,1,2)
)

I need to select the columns that have not the same values grouped by "document". So in my df I need to choose only the column 3,4 and 5.
Why this doesnt work?

df %>%
  group_by(document) %>%
  select_if(~length(unique(.)) > 1)

It selects all the colunm except the 4

scottyd22 · August 10, 2023, 8:32pm

Hi @juandmaz! I'm not exactly sure why the code you provided is not working, but here is alternative approach using pivot_longer() and pivot_wider()that I believe arrives at the desired outcome.

library(tidyverse)

# select columns whose values differ when grouped 
df |>
  mutate(row = row_number()) |>
  pivot_longer(cols = c(-'document', -'row')) |>
  group_by(document, name) |>
  filter(length(unique(value)) > 1) |>
  ungroup() |>
  pivot_wider(names_from = name, values_from = value) |>
  select(-row)
#> # A tibble: 4 × 3
#>   document column1 column2
#>   <chr>      <dbl>   <dbl>
#> 1 A              1       3
#> 2 B              2       3
#> 3 A              3       4
#> 4 B              4       4

# select columns whose values are the same when grouped 
df |>
  mutate(row = row_number()) |>
  pivot_longer(cols = c(-'document', -'row')) |>
  group_by(document, name) |>
  filter(length(unique(value)) == 1) |>
  ungroup() |>
  pivot_wider(names_from = name, values_from = value) |>
  select(-row)
#> # A tibble: 4 × 4
#>   document column3 column4 column5
#>   <chr>      <dbl>   <dbl>   <dbl>
#> 1 A              5       1       1
#> 2 B              6       1       2
#> 3 A              5       1       1
#> 4 B              6       1       2

Created on 2023-08-10 with reprex v2.0.2

juandmaz · August 14, 2023, 7:22pm

It works, thanks!
Could it be that NA does not count as dissent?
Do you know why my way didnt worked?

nirgrahamuk · August 15, 2023, 10:17am

the predicate functions select_if, mutate_if dont consider groupings, this is documented.
another way might be

library(tidyverse)
df<- data.frame(
  document= c("A", "B", "A", "B"),
  column1 = c(1, 2, 3, 4),
  column2 = c(3, 3, 4, 4),
  column3 = c(5, 6, 5, 6),
  column4= c(1,1,1,1),
  column5= c(1,2,1,2)
)

(header <- df %>%
  group_by(document) %>%
  summarise(across(everything(),
            ~length(unique(.)) == 1)) |>
              summarise(across(where(is.logical),
                        ~all(.)))  |> 
    pivot_longer(everything()) |> 
    deframe())

(to_keep <- c('document',names(header)[header]))
df |> select(document,
             all_of(to_keep))

Though scotty's pivot longer and then pivot_wider approach is more succinct

juandmaz · August 17, 2023, 6:25pm

I did not know that the select_if and mutate_if functions do not allow grouping.
your method works but I would need to apply it without creating objects in the middle, is that possible?

system · September 28, 2023, 6:26pm

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.