Hi RStudio community,
I have the following data and I would like to drop repeated x by id after sorting by 3 columns. For example, I would like to drop the highlighted obs in the dataset.
I used the below code, but it is filtering repeated x without considering other columns such as id (.e.g. , it is filtering 18 5 2021-02-18 B) . If somebody can help me, I appreciate it.
Thanks
data <- data.frame(id = c(1L,1L,1L,2L,2L,2L,3L,3L,3L,3L,4L,4L,4L,4L,5L,5L,5L,5L,6L,6L,6L,6L,6L),
date = c("2020-01-20", "2021-04-25","2021-08-12","2021-03-15","2021-05-17","2021-07-19","2021-03-15", "2021-05-16","2021-06-17", "2021-08-18",
"2021-08-18","2021-02-11", "2021-08-18", "2021-03-19", "2021-06-11", "2021-06-11", "2021-10-01",
"2021-02-18", "2021-04-12", "2021-09-13", "2021-06-07", "2021-08-08", "2021-10-18"),
x = factor(c("A", "B", "C", "A", "A", "B", "A", "B","B", "C", "A", "A", "B", "C", "A",
"A", "B", "B", "A", "A", "B", "B", "C")),
stringsAsFactors = FALSE)
id date x
1 1 2020-01-20 A
2 1 2021-04-25 B
3 1 2021-08-12 C
4 2 2021-03-15 A
**5 2 2021-05-17 A**
6 2 2021-07-19 B
7 3 2021-03-15 A
8 3 2021-05-16 B
**9 3 2021-06-17 B**
10 3 2021-08-18 C
12 4 2021-02-11 A
14 4 2021-03-19 C
11 4 2021-08-18 A
13 4 2021-08-18 B
18 5 2021-02-18 B
15 5 2021-06-11 A
**16 5 2021-06-11 A**
17 5 2021-10-01 B
18 5 2021-02-18 B
19 6 2021-04-12 A
21 6 2021-06-07 B
**22 6 2021-08-08 B**
20 6 2021-09-13 A
23 6 2021-10-18 C
# filter repeated x by id
library(dplyr)
data2<-data %>% filter(x!= lag(x, default="1"))