Return the unique records and the last duplicated row

Khundiman · October 4, 2018, 3:49pm


library(dplyr)

df <- data.frame(id = c(1,2,3,1,3,4,3), ch = c(1:7))

df
#>   id ch
#> 1  1  1
#> 2  2  2
#> 3  3  3
#> 4  1  4
#> 5  3  5
#> 6  4  6
#> 7  3  7

#This code returns the unique records and the first duplicated row but I would like to return the last duplicated row instead.

df %>%
  group_by(id) %>%
  filter(row_number() == 1)
#> # A tibble: 4 x 2
#> # Groups:   id [4]
#>      id    ch
#>   <dbl> <int>
#> 1     1     1
#> 2     2     2
#> 3     3     3
#> 4     4     6

^{Created on 2018-10-04 by the reprex package (v0.2.1)}

mishabalyasin · October 4, 2018, 3:55pm

You can always use max to return last row:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

df <- data.frame(id = c(1,2,3,1,3,4,3), ch = c(1:7))

df %>%
  group_by(id) %>%
  arrange(ch) %>%
  filter(row_number() == max(row_number()))
#> # A tibble: 4 x 2
#> # Groups:   id [4]
#>      id    ch
#>   <dbl> <int>
#> 1     2     2
#> 2     1     4
#> 3     4     6
#> 4     3     7

^{Created on 2018-10-04 by the reprex package (v0.2.1)}

I've added arrange into the pipeline because otherwise I don't see how you want to decide which row is last.

Khundiman · October 4, 2018, 3:59pm

Thanks @mishabalyasin