Using n_distinct

dplyr

#1

Greetings, I’ve come across a "data wrangling cheatsheet" and have been trying everything on it. However, I am not understanding how to use n_distinct. My goal is to identify the unique values of Species in the iris dataset. Is this possible with n_distinct or should I be using some other function?
Cheers,
Jason


#2

Hi Jason,

First of all, you can actually paste the code from reprex right into the text box here on the community site. It will be on your clipboard after you generate the reprex, so it's just a matter of pasting it in (in this case, you also need to load the library to get the data— see the reprex FAQ for detail).

n_distinct() will return the number of unique values, not the values themselves.

library(dplyr, warn.conflicts = FALSE)
dplyr::n_distinct(iris$Species)
#> [1] 3
dplyr::n_distinct(iris)
#> [1] 149

unique(iris$Species)
#> [1] setosa     versicolor virginica 
#> Levels: setosa versicolor virginica

Created on 2018-10-01 by the reprex package (v0.2.1.9000)

From the docs:

This [n_distinct()] is a faster and more concise equivalent of length(unique(x)).


#3

Hi Mara,

Thanks for the response. My problem was the fact that I was using a " , " instead of a " $ " - as you can see below. For some reason I flaked and never thought to use it. Thanks to everyone for all the help.

Cheers,
Jason

dplyr::n_distinct(iris$Species)
#> [1] 3
unique(iris$Species)
#> [1] setosa     versicolor virginica 
#> Levels: setosa versicolor virginica

Created on 2018-10-02 by the reprex package (v0.2.1)