I have this df
df<- data.frame(
document= c("A", "B", "A", "B"),
column1 = c(1, 2, 3, 4),
column2 = c(3, 3, 4, 4),
column3 = c(5, 6, 5, 6),
column4= c(1,1,1,1),
column5= c(1,2,1,2)
)
I need to select the columns that have not the same values grouped by "document". So in my df I need to choose only the column 3,4 and 5.
Why this doesnt work?
df %>%
group_by(document) %>%
select_if(~length(unique(.)) > 1)
It selects all the colunm except the 4
Hi @juandmaz ! I'm not exactly sure why the code you provided is not working, but here is alternative approach using pivot_longer()
and pivot_wider()
that I believe arrives at the desired outcome.
library(tidyverse)
# select columns whose values differ when grouped
df |>
mutate(row = row_number()) |>
pivot_longer(cols = c(-'document', -'row')) |>
group_by(document, name) |>
filter(length(unique(value)) > 1) |>
ungroup() |>
pivot_wider(names_from = name, values_from = value) |>
select(-row)
#> # A tibble: 4 × 3
#> document column1 column2
#> <chr> <dbl> <dbl>
#> 1 A 1 3
#> 2 B 2 3
#> 3 A 3 4
#> 4 B 4 4
# select columns whose values are the same when grouped
df |>
mutate(row = row_number()) |>
pivot_longer(cols = c(-'document', -'row')) |>
group_by(document, name) |>
filter(length(unique(value)) == 1) |>
ungroup() |>
pivot_wider(names_from = name, values_from = value) |>
select(-row)
#> # A tibble: 4 × 4
#> document column3 column4 column5
#> <chr> <dbl> <dbl> <dbl>
#> 1 A 5 1 1
#> 2 B 6 1 2
#> 3 A 5 1 1
#> 4 B 6 1 2
Created on 2023-08-10 with reprex v2.0.2
It works, thanks!
Could it be that NA does not count as dissent?
Do you know why my way didnt worked?
the predicate functions select_if, mutate_if dont consider groupings, this is documented.
another way might be
library(tidyverse)
df<- data.frame(
document= c("A", "B", "A", "B"),
column1 = c(1, 2, 3, 4),
column2 = c(3, 3, 4, 4),
column3 = c(5, 6, 5, 6),
column4= c(1,1,1,1),
column5= c(1,2,1,2)
)
(header <- df %>%
group_by(document) %>%
summarise(across(everything(),
~length(unique(.)) == 1)) |>
summarise(across(where(is.logical),
~all(.))) |>
pivot_longer(everything()) |>
deframe())
(to_keep <- c('document',names(header)[header]))
df |> select(document,
all_of(to_keep))
Though scotty's pivot longer and then pivot_wider approach is more succinct
nirgrahamuk:
select_if, mutate_if
I did not know that the select_if and mutate_if functions do not allow grouping.
your method works but I would need to apply it without creating objects in the middle, is that possible?
system
Closed
September 28, 2023, 6:26pm
7
This topic was automatically closed 42 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.