Irakli
August 6, 2022, 3:19pm
1
I cannot share the data, but the principle of what I want to achieve is easy - across several columns, I want to filter out the rows that do not change (from row to row, but only when non of the columns change). Overall, I want to track changes in the rows, if this makes it more simple.
What would be the best way to filter the dataset that way?
Thank you in advance.
FJCC
August 6, 2022, 4:48pm
2
Here is one possible solution, though it isn't very elegant.
library(dplyr)
DF <- data.frame(A=c(1,1,1,1,1),B=c(1,3,1,1,1),C=c(1,1,1,1,2))
#Row 4 should be removed
DF
#> A B C
#> 1 1 1 1
#> 2 1 3 1
#> 3 1 1 1
#> 4 1 1 1
#> 5 1 1 2
lagged <- rbind(c(NA,NA,NA),DF[-nrow(DF),])
colnames(lagged) <- paste("L",colnames(lagged),sep="_")
lagged
#> L_A L_B L_C
#> 1 NA NA NA
#> 2 1 1 1
#> 3 1 3 1
#> 4 1 1 1
#> 5 1 1 1
Combinded <- cbind(DF,lagged)
Combinded
#> A B C L_A L_B L_C
#> 1 1 1 1 NA NA NA
#> 2 1 3 1 1 1 1
#> 3 1 1 1 1 3 1
#> 4 1 1 1 1 1 1
#> 5 1 1 2 1 1 1
Combinded <- Combinded |> rowwise() |>
mutate(Flag = all(c_across(A:C) == c_across(L_A:L_C)))
DF <- Combinded |> filter(is.na(Flag) | !Flag) |>
select(A:C)
DF
#> # A tibble: 4 x 3
#> # Rowwise:
#> A B C
#> <dbl> <dbl> <dbl>
#> 1 1 1 1
#> 2 1 3 1
#> 3 1 1 1
#> 4 1 1 2
Created on 2022-08-06 by the reprex package (v2.0.1)
Irakli
August 7, 2022, 10:43am
3
Thank you so much for taking the time to go through this! Do you think I can somehow replace lagging step with diff() function?
FJCC
August 7, 2022, 3:56pm
4
library(dplyr)
DF <- data.frame(A=c(1,1,1,1,1),B=c(1,3,1,1,1),C=c(1,1,1,1,2))
DF
#> A B C
#> 1 1 1 1
#> 2 1 3 1
#> 3 1 1 1
#> 4 1 1 1
#> 5 1 1 2
DF <- DF |> mutate(across(.cols = everything(),
.fns = function(V) c(NA,diff(V)),
.names = "L_{.col}"))
DF
#> A B C L_A L_B L_C
#> 1 1 1 1 NA NA NA
#> 2 1 3 1 0 2 0
#> 3 1 1 1 0 -2 0
#> 4 1 1 1 0 0 0
#> 5 1 1 2 0 0 1
DF <- DF|> rowwise() |>
mutate(Flag = all(c_across(L_A:L_C)==0))
DF
#> # A tibble: 5 x 7
#> # Rowwise:
#> A B C L_A L_B L_C Flag
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl>
#> 1 1 1 1 NA NA NA NA
#> 2 1 3 1 0 2 0 FALSE
#> 3 1 1 1 0 -2 0 FALSE
#> 4 1 1 1 0 0 0 TRUE
#> 5 1 1 2 0 0 1 FALSE
DF <- DF |> filter(is.na(Flag) | !Flag) |>
select(A:C)
DF
#> # A tibble: 4 x 3
#> # Rowwise:
#> A B C
#> <dbl> <dbl> <dbl>
#> 1 1 1 1
#> 2 1 3 1
#> 3 1 1 1
#> 4 1 1 2
Created on 2022-08-07 by the reprex package (v2.0.1)
1 Like
system
Closed
August 14, 2022, 3:57pm
5
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.