Struggling with distinct()

I'm trying to get something simple using group_by, count & mutate without success.
I want an answer to - How many distinct invNo are there by custId?
The answer I'm after is -
custId invNo
1 2
2 2
Here are my attempts.

library (tidyverse)

df = data.frame (
  custId = c(1, 1,1,1,2,2,2,2),
  invNo = c(201, 201,202,202,301,301,302,302),
  item = c('a', 'b', 'a', 'c','a', 'b', 'a', 'c'),
  amount = c(10, 20, 10, 30, 10, 20, 10, 30)
)

glimpse(df)
df

df %>% count(item)
df %>% count(custId,invNo)
df %>% count(invNo, custId)

df %>% group_by(custId) %>% count(invNo) %>% distinct()
df %>% group_by(custId) %>% mutate(unity=1) %>% summarize(Freq=n(), ans=sum(unity))  %>% distinct()
```{r}

Welcome to the community!

For this type of question, n_distinct is more useful. See below:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

dataset <- data.frame (custId = c(1, 1, 1, 1, 2, 2, 2, 2),
                       invNo = c(201, 201, 202, 202, 301, 301, 302, 302),
                       item = c('a', 'b', 'a', 'c','a', 'b', 'a', 'c'),
                       amount = c(10, 20, 10, 30, 10, 20, 10, 30))

dataset %>%
  group_by(custId) %>%
  summarise(distinct_invNo_count = n_distinct(invNo))
#> # A tibble: 2 x 2
#>   custId distinct_invNo_count
#>    <dbl>                <int>
#> 1      1                    2
#> 2      2                    2

Created on 2019-07-02 by the reprex package (v0.3.0)

Hope this helps.

4 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.