Greetings, I’ve come across a "data wrangling cheatsheet" and have been trying everything on it. However, I am not understanding how to use n_distinct. My goal is to identify the unique values of Species in the iris dataset. Is this possible with n_distinct or should I be using some other function?
Cheers,
Jason
Hi Jason,
First of all, you can actually paste the code from reprex right into the text box here on the community site. It will be on your clipboard after you generate the reprex, so it's just a matter of pasting it in (in this case, you also need to load the library to get the data— see the reprex FAQ for detail).
n_distinct()
will return the number of unique values, not the values themselves.
library(dplyr, warn.conflicts = FALSE)
dplyr::n_distinct(iris$Species)
#> [1] 3
dplyr::n_distinct(iris)
#> [1] 149
unique(iris$Species)
#> [1] setosa versicolor virginica
#> Levels: setosa versicolor virginica
Created on 2018-10-01 by the reprex package (v0.2.1.9000)
From the docs:
This [
n_distinct()
] is a faster and more concise equivalent oflength(unique(x))
.
Hi Mara,
Thanks for the response. My problem was the fact that I was using a " , " instead of a " $ " - as you can see below. For some reason I flaked and never thought to use it. Thanks to everyone for all the help.
Cheers,
Jason
dplyr::n_distinct(iris$Species)
#> [1] 3
unique(iris$Species)
#> [1] setosa versicolor virginica
#> Levels: setosa versicolor virginica
Created on 2018-10-02 by the reprex package (v0.2.1)