Sorted vector of unique elements with data.table

ttrodrigz · August 12, 2019, 3:31pm

I'm trying to come up with the most efficient data.table version of an operation I typically perform using dplyr. I have a solution, but I'm wondering if anyone has a cleaner/more efficient answer to this simple task. All I'm doing is returning a sorted vector of the unique elements in a column.

`dplyr` version

library(tibble)
library(dplyr)

x.tbl <- tibble(
    a = c("b", "a", "b", "a"),
    b = 1:4
)

x.tbl %>%
    distinct(a) %>%
    pull() %>%
    sort()
#> [1] "a" "b"

`data.table` version

library(data.table)

x.dt <- as.data.table(x.tbl)

sort(x.dt[, .N, by = a][, a])
#> [1] "a" "b"

Can anyone suggest a more efficient or cleaner way of doing this with data.table? Is there an way to eliminate the data.table chaining and wrapping in sort()?

Thanks!

martin.R · August 12, 2019, 4:36pm

There may be a better alternative to sort, but this works:
x.dt[, sort(unique(a))]

Also, note that if you use keyby, rather than by, that the output will be sorted already. It's also technically quicker, I believe, but is not the default for backward compatibility.

ttrodrigz · August 12, 2019, 5:03pm

Excellent, thank you for the explanation.

system · August 19, 2019, 5:03pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Sorted vector of unique elements with data.table

dplyr version

data.table version

`dplyr` version

`data.table` version