Cumulative count of new values only


#1

Hello! Let's say I have a data set of someone's ATM visits in order. I want to keep a cumulative total running each time the person visits a new ATM. If it's an ATM they've visited before though, I don't want to count that.

I've cobbled together something that works for one person at a time below, but I'm not sure how I would turn this into something that could be applied across tens of thousands of people. Client A visits several different ATMs, while Client B just visits the same one each time.

Appreciate any help or tips, I'm a little lost on this one.

library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 3.4.3
#> -- Attaching packages --------------------------------------------------------- tidyverse 1.2.1 --
#> v ggplot2 2.2.1     v purrr   0.2.5
#> v tibble  1.4.2     v dplyr   0.7.4
#> v tidyr   0.8.0     v stringr 1.3.1
#> v readr   1.1.1     v forcats 0.2.0
#> Warning: package 'tibble' was built under R version 3.4.3
#> Warning: package 'tidyr' was built under R version 3.4.3
#> Warning: package 'purrr' was built under R version 3.4.4
#> Warning: package 'dplyr' was built under R version 3.4.3
#> Warning: package 'stringr' was built under R version 3.4.4
#> -- Conflicts ------------------------------------------------------------ tidyverse_conflicts() --
#> x dplyr::filter() masks stats::filter()
#> x dplyr::lag()    masks stats::lag()

transactions <- tribble(
  ~client, ~day, ~ATM_location,
  #---------#----#-----#
  "A", 1L, "Bank",
  "A", 4L, "Elgin",
  "A", 10L, "Broadview",
  "A", 11L, "Broadview",
  "B", 1L, "Bank",
  "B", 3L, "Bank",
  "B", 5L, "Bank",
  "B", 6L, "Bank"
)

# count for one client
one_client <- transactions %>% 
  filter( client == "A" )

# create a function that increments counter each time we see a new doctor for a SIN
i = 0
been_to = c()

check_new <- function(x) {
  
  if (!x %in% been_to) { 
    i <<- i + 1
    been_to <<- c(x, been_to)
  }
  i
}

# count new atm visits for client "A"
res <- map( one_client$ATM_location, check_new )
# wrangle result
res %>% 
  enframe( value = "unique_atms" ) %>%
  select(-name) %>% 
  unnest %>% 
  bind_cols( one_client )
#> # A tibble: 4 x 4
#>   unique_atms client   day ATM_location
#>         <dbl> <chr>  <int> <chr>       
#> 1        1.00 A          1 Bank        
#> 2        2.00 A          4 Elgin       
#> 3        3.00 A         10 Broadview   
#> 4        3.00 A         11 Broadview

# count new atm visits for client "B"
one_client <- transactions %>% 
  filter( client == "B" )
i = 0
been_to = c()
# map function over one client
res <- map( one_client$ATM_location, check_new )
# wrangle result
res %>% 
  enframe( value = "unique_atms") %>%
  select(-name) %>% 
  unnest %>% 
  bind_cols( one_client )
#> # A tibble: 4 x 4
#>   unique_atms client   day ATM_location
#>         <dbl> <chr>  <int> <chr>       
#> 1        1.00 B          1 Bank        
#> 2        1.00 B          3 Bank        
#> 3        1.00 B          5 Bank        
#> 4        1.00 B          6 Bank

#2

Hello adamk,

What about this solution?

library(tidyverse)
transactions <- tribble(
  ~client, ~day, ~ATM_location,
  #---------#----#-----#
  "A", 1L, "Bank",
  "A", 4L, "Elgin",
  "A", 10L, "Broadview",
  "A", 11L, "Broadview",
  "B", 1L, "Bank",
  "B", 3L, "Bank",
  "B", 5L, "Bank",
  "B", 6L, "Bank"
)

transactions %>%
  group_by(client) %>%
  mutate(unique_ATM_location = !duplicated(ATM_location),
         unique_atms = cumsum(unique_ATM_location)) %>%
  select(unique_atms, client, day, ATM_location)
#> # A tibble: 8 x 4
#> # Groups:   client [2]
#>   unique_atms client   day ATM_location
#>         <int> <chr>  <int> <chr>       
#> 1           1 A          1 Bank        
#> 2           2 A          4 Elgin       
#> 3           3 A         10 Broadview   
#> 4           3 A         11 Broadview   
#> 5           1 B          1 Bank        
#> 6           1 B          3 Bank        
#> 7           1 B          5 Bank        
#> 8           1 B          6 Bank

Created on 2018-08-07 by the reprex package (v0.2.0).

Hope it helps or point in the right direction!

Regards,


#3

That's exactly the thing I need. Thank you!