select columns with different values grouped by one

I have this df

df<- data.frame(
  document= c("A", "B", "A", "B"),
  column1 = c(1, 2, 3, 4),
  column2 = c(3, 3, 4, 4),
  column3 = c(5, 6, 5, 6),
  column4= c(1,1,1,1),
  column5= c(1,2,1,2)
)

I need to select the columns that have not the same values grouped by "document". So in my df I need to choose only the column 3,4 and 5.
Why this doesnt work?

df %>%
  group_by(document) %>%
  select_if(~length(unique(.)) > 1)

It selects all the colunm except the 4

Hi @juandmaz! I'm not exactly sure why the code you provided is not working, but here is alternative approach using pivot_longer() and pivot_wider()that I believe arrives at the desired outcome.

library(tidyverse)

# select columns whose values differ when grouped 
df |>
  mutate(row = row_number()) |>
  pivot_longer(cols = c(-'document', -'row')) |>
  group_by(document, name) |>
  filter(length(unique(value)) > 1) |>
  ungroup() |>
  pivot_wider(names_from = name, values_from = value) |>
  select(-row)
#> # A tibble: 4 × 3
#>   document column1 column2
#>   <chr>      <dbl>   <dbl>
#> 1 A              1       3
#> 2 B              2       3
#> 3 A              3       4
#> 4 B              4       4

# select columns whose values are the same when grouped 
df |>
  mutate(row = row_number()) |>
  pivot_longer(cols = c(-'document', -'row')) |>
  group_by(document, name) |>
  filter(length(unique(value)) == 1) |>
  ungroup() |>
  pivot_wider(names_from = name, values_from = value) |>
  select(-row)
#> # A tibble: 4 × 4
#>   document column3 column4 column5
#>   <chr>      <dbl>   <dbl>   <dbl>
#> 1 A              5       1       1
#> 2 B              6       1       2
#> 3 A              5       1       1
#> 4 B              6       1       2

Created on 2023-08-10 with reprex v2.0.2

It works, thanks!
Could it be that NA does not count as dissent?
Do you know why my way didnt worked?

the predicate functions select_if, mutate_if dont consider groupings, this is documented.
another way might be

library(tidyverse)
df<- data.frame(
  document= c("A", "B", "A", "B"),
  column1 = c(1, 2, 3, 4),
  column2 = c(3, 3, 4, 4),
  column3 = c(5, 6, 5, 6),
  column4= c(1,1,1,1),
  column5= c(1,2,1,2)
)

(header <- df %>%
  group_by(document) %>%
  summarise(across(everything(),
            ~length(unique(.)) == 1)) |>
              summarise(across(where(is.logical),
                        ~all(.)))  |> 
    pivot_longer(everything()) |> 
    deframe())

(to_keep <- c('document',names(header)[header]))
df |> select(document,
             all_of(to_keep))

Though scotty's pivot longer and then pivot_wider approach is more succinct

I did not know that the select_if and mutate_if functions do not allow grouping.
your method works but I would need to apply it without creating objects in the middle, is that possible?

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.