library(dplyr)
df <- data.frame(id = c(1,2,3,1,3,4,3), ch = c(1:7))
df
#> id ch
#> 1 1 1
#> 2 2 2
#> 3 3 3
#> 4 1 4
#> 5 3 5
#> 6 4 6
#> 7 3 7
#This code returns the unique records and the first duplicated row but I would like to return the last duplicated row instead.
df %>%
group_by(id) %>%
filter(row_number() == 1)
#> # A tibble: 4 x 2
#> # Groups: id [4]
#> id ch
#> <dbl> <int>
#> 1 1 1
#> 2 2 2
#> 3 3 3
#> 4 4 6
Created on 2018-10-04 by the reprex package (v0.2.1)
You can always use max
to return last row:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
df <- data.frame(id = c(1,2,3,1,3,4,3), ch = c(1:7))
df %>%
group_by(id) %>%
arrange(ch) %>%
filter(row_number() == max(row_number()))
#> # A tibble: 4 x 2
#> # Groups: id [4]
#> id ch
#> <dbl> <int>
#> 1 2 2
#> 2 1 4
#> 3 4 6
#> 4 3 7
Created on 2018-10-04 by the reprex package (v0.2.1)
I've added arrange
into the pipeline because otherwise I don't see how you want to decide which row is last.
3 Likes