How to use loop to find last date of each customer in R

Hi all,
I have multiple entry records of customers in different day and i want to use loop to find their last visit day of every unique customer. i am a beginner in r, could anyone give me some advice on how to start. Below is the data set and result set i am looking forward to. Many thanks.

#Dataset
name <- c('Mary','Sue','Peter','Mary','Mary','John','Sue','Peter','Peter','John',
'John','John','Mary','Mary')
date <- c('01/04/2018','03/02/2017','01/01/2019','24/04/2017',
'02/03/2019','31/05/2019','08/09/2019','17/12/2019',
'02/08/2017','10/11/2017','30/12/2017','18/02/2018',
'18/02/2018','18/10/2019')

data <- data.frame(name,date)

data
name date
1 Mary 01/04/2018
2 Sue 03/02/2017
3 Peter 01/01/2019
4 Mary 24/04/2017
5 Mary 02/03/2019
6 John 31/05/2019
7 Sue 08/09/2019
8 Peter 17/12/2019
9 Peter 02/08/2017
10 John 10/11/2017
11 John 30/12/2017
12 John 18/02/2018
13 Mary 18/02/2018
14 Mary 18/10/2019

#Result Set

r_name <- c('Mary','Sue','Peter','John')
r_lastdate <- c('18/10/2019','08/09/2019','17/12/2019','31/05/2019')

r_data <- data.frame(r_name, r_lastdate)

r_data
r_name r_lastdate
1 Mary 18/10/2019
2 Sue 08/09/2019
3 Peter 17/12/2019
4 John 31/05/2019

Here's a tidyverse solution that doesn't involve a loop.

library(tidyverse)

data %>%
  mutate(date = as.Date(date, format = "%d/%m/%Y")) %>%
  group_by(name) %>%
  summarize(last_date = max(date))

# A tibble: 4 x 2
  name  last_date 
  <fct> <date>    
1 John  2019-05-31
2 Mary  2019-10-18
3 Peter 2019-12-17
4 Sue   2019-09-08
2 Likes

Hi Jay,
Thank you very much for your prompt reply and this is a very straight and quick solution. Really appreciated that.

Could I also ask you how to make two any other sub tables which

  1. Still have unique name as the row and list the trip 1,2,3,4,5 and so on of each person and hv the avg trip n grand total at last column n row.

For example

Trip 1 Trip2 Total trips
Mary. dd/mm/yyyy dd/mm/yyyy 2
John. dd/mm/yyyy. N/A 1
Total. 2 1 3

  1. The second table I want to know the lag days between trips. Lag is the day between trips.

For example

Lag1 Lag2 Avg.Lag
Mary. 3 4 3.5
John. 5 1 3

Thanks a lot for your help.

@Dlok, those are fairly different problems that would require more complicated solutions (splitting, munging, restacking, pivoting, etc.). I'm new enough to this platform that I'm not sure about the protocols, but on Stack Overflow, where I've spent more time, you would be encouraged to post new questions for new problems. So I'll do that here.

Hi Jay,
Thank you very much for your reply. I made a new post below.

Thanks

1 Like

I have not found solution yet, appreciate if you can take a look when you have time. Million thanks.

library(tidyverse)
library(lubridate)
name <- c('Mary','Sue','Peter','Mary','Mary','John','Sue','Peter','Peter','John',
          'John','John','Mary','Mary')
date <- c('01/04/2018','03/02/2017','01/01/2019','24/04/2017',
          '02/03/2019','31/05/2019','08/09/2019','17/12/2019',
          '02/08/2017','10/11/2017','30/12/2017','18/02/2018',
          '18/02/2018','18/10/2019')

data <- data.frame(name,
                   date=lubridate::dmy(date))

# Still have unique name as the row and list the trip 1,2,3,4,5 and 
# so on of each person and hv the avg trip n grand total at last column n row.
data2 <- arrange(data,
                 name,date) %>% 
  group_by(name) %>% 
  mutate(order=row_number())

times_df <- pivot_wider(data2,
                        id_cols=name,
                        names_from=order,
                        names_prefix = "time_",
                        values_from = "date") 

times_df_lgl <- mutate_all(times_df,
                           as.logical)
times_df_lgl$num <- rowSums(times_df_lgl[names(times_df_lgl)[-1]],na.rm = TRUE)
(avg_times<-mean(times_df_lgl$num,na.rm = TRUE))
(sum_times<-sum(times_df_lgl$num,na.rm = TRUE))
times_df$num<-times_df_lgl$num
jam_it_together <- mutate_all(times_df,
                              as.character)
jam_it_together <- bind_rows(jam_it_together,
                         mutate_all(data.frame(
                                    name="Total",
                                    time_1=sum_times,
                                    time_2="Avg",
                                    time_3=avg_times),as.character))

# > jam_it_together
# # A tibble: 5 x 7
# # Groups:   name [5]
# name  time_1     time_2     time_3     time_4     time_5     num  
# <chr> <chr>      <chr>      <chr>      <chr>      <chr>      <chr>
# 1 John  2017-11-10 2017-12-30 2018-02-18 2019-05-31 NA         4    
# 2 Mary  2017-04-24 2018-02-18 2018-04-01 2019-03-02 2019-10-18 5    
# 3 Peter 2017-08-02 2019-01-01 2019-12-17 NA         NA         3    
# 4 Sue   2017-02-03 2019-09-08 NA         NA         NA         2    
# 5 Total 14         Avg        3.5        NA         NA         NA 

# The second table I want to know the lag days between trips. Lag is the day between trips.
data3 <- mutate(data2,
                lead_date=lead(date),
                days_between = lead_date-date
                )

data4 <-  pivot_wider(data3 %>% filter(!is.na(days_between)),
                      id_cols=name,
                      names_from=order,
                      names_prefix = "lag_",
                      values_from = "days_between") 
#from here follow a similar approach to example 1

I don't know why I always encounter this problem

  > data2 <- arrange(data,
+                  name,date) %>% 
+   group_by(name) %>% 
+   mutate(order=row_number())
Error: row_number() should only be called in a data context
Run `rlang::last_error()` to see where the error occurred.

is that on your full data that you havent shared, or the example data that I shared back with you ? or does it error on both ?

you should do sessionInfo() in console so we can check version numbers.
If I restart R, (Ctrl+Shft+F10) run the code , and then sessionInfo
my sessionInfo is()

> sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252    LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C                            LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] lubridate_1.7.4 forcats_0.4.0   stringr_1.4.0   dplyr_0.8.5     purrr_0.3.3     readr_1.3.1     tidyr_1.0.2     tibble_3.0.0    ggplot2_3.2.1  
[10] tidyverse_1.3.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4       cellranger_1.1.0 pillar_1.4.3     compiler_3.6.2   dbplyr_1.4.2     tools_3.6.2      jsonlite_1.6.1   lifecycle_0.2.0  nlme_3.1-142    
[10] gtable_0.3.0     lattice_0.20-38  pkgconfig_2.0.3  rlang_0.4.5      reprex_0.3.0     cli_2.0.2        DBI_1.1.0        rstudioapi_0.11  haven_2.2.0     
[19] withr_2.1.2      xml2_1.2.2       httr_1.4.1       fs_1.3.1         generics_0.0.2   vctrs_0.2.4      hms_0.5.3        grid_3.6.2       tidyselect_1.0.0
[28] glue_1.4.0       R6_2.4.1         fansi_0.4.1      readxl_1.3.1     modelr_0.1.6     magrittr_1.5     backports_1.1.5  scales_1.1.0     ellipsis_0.3.0  
[37] rvest_0.3.5      assertthat_0.2.1 colorspace_1.4-1 stringi_1.4.6    lazyeval_0.2.2   munsell_0.5.0    broom_0.5.4      crayon_1.3.4

i just copied your code so its your data.


> sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] lubridate_1.7.4       tidyr_1.0.2           openair_2.7-2         stringr_1.4.0        
 [5] ztable_0.2.0          epiDisplay_3.5.0.1    nnet_7.3-12           MASS_7.3-51.4        
 [9] survival_3.1-8        foreign_0.8-72        plyr_1.8.5            reshape_0.8.8        
[13] ggpubr_0.2.5          magrittr_1.5          rlang_0.4.4           tibble_2.1.3         
[17] arsenal_3.4.0         gapminder_0.3.0       yaml_2.2.1            highcharter_0.7.0    
[21] openintro_1.7.1       dplyr_0.8.4           plotly_4.9.2          rpivotTable_0.3.0    
[25] DT_0.12               knitr_1.28            flexdashboard_0.5.1.1 hrbrthemes_0.6.0     
[29] viridis_0.5.1         viridisLite_0.3.0     ggplot2_3.2.1         tmap_2.3-2           
[33] rgdal_1.4-8           sp_1.3-2              RSQLite_2.2.0        

loaded via a namespace (and not attached):
  [1] colorspace_1.4-1    ggsignif_0.6.0      class_7.3-15        leaflet_2.0.3      
  [5] flextable_0.5.8     base64enc_0.1-3     dichromat_2.0-0     rstudioapi_0.11    
  [9] hexbin_1.28.1       bit64_0.9-7         fansi_0.4.1         xml2_1.2.2         
 [13] codetools_0.2-16    splines_3.6.2       extrafont_0.17      rlist_0.4.6.1      
 [17] jsonlite_1.6.1      tmaptools_2.0-2     broom_0.5.4         Rttf2pt1_1.3.8     
 [21] cluster_2.1.0       png_0.1-7           rgeos_0.5-2         shiny_1.4.0        
 [25] readr_1.3.1         mapproj_1.2.7       compiler_3.6.2      httr_1.4.1         
 [29] backports_1.1.5     assertthat_0.2.1    Matrix_1.2-18       fastmap_1.0.1      
 [33] lazyeval_0.2.2      cli_2.0.1           later_1.0.0         htmltools_0.4.0    
 [37] tools_3.6.2         igraph_1.2.4.2      gtable_0.3.0        glue_1.3.1         
 [41] maps_3.3.0          Rcpp_1.0.3          raster_3.0-12       vctrs_0.2.2        
 [45] nlme_3.1-142        extrafontdb_1.0     leafsync_0.1.0      crosstalk_1.0.0    
 [49] lwgeom_0.2-1        xfun_0.12           mime_0.9            lifecycle_0.1.0    
 [53] XML_3.99-0.3        zoo_1.8-7           scales_1.1.0        hms_0.5.3          
 [57] promises_1.1.0      RColorBrewer_1.1-2  quantmod_0.4-15     curl_4.3           
 [61] memoise_1.1.0       gridExtra_2.3       gdtools_0.2.1       latticeExtra_0.6-29
 [65] stringi_1.4.4       e1071_1.7-3         TTR_0.23-6          zip_2.0.4          
 [69] pkgconfig_2.0.3     systemfonts_0.1.1   evaluate_0.14       lattice_0.20-38    
 [73] purrr_0.3.3         sf_0.8-1            htmlwidgets_1.5.1   bit_1.1-15.2       
 [77] tidyselect_1.0.0    R6_2.4.1            generics_0.0.2      DBI_1.1.0          
 [81] mgcv_1.8-31         pillar_1.4.3        whisker_0.4         withr_2.1.2        
 [85] units_0.6-5         xts_0.12-0          crayon_1.3.4        utf8_1.1.4         
 [89] uuid_0.1-2          KernSmooth_2.23-16  rmarkdown_2.1       officer_0.3.6      
 [93] jpeg_0.1-8.1        grid_3.6.2          data.table_1.12.8   blob_1.2.1         
 [97] digest_0.6.23       classInt_0.4-2      xtable_1.8-4        httpuv_1.5.2       
[101] munsell_0.5.0      
>

thats odd, i dont see tidyverse on your list at all.
your tibble is a full version out of date. ( ha this is only 6 days since 3.0 was released but still)
perhaps install.packages("tidyverse") would clear things up for you.

Any problem with this warning below? :sweat_smile:

library(tidyverse)
-- Attaching packages --------------------------------------- tidyverse 1.3.0 --
v readr 1.3.1 v forcats 0.4.0
v purrr 0.3.3
-- Conflicts ------------------------------------------ tidyverse_conflicts() --
x purrr::%@%() masks rlang::%@%()
x epiDisplay::alpha() masks ggplot2::alpha()
x plyr::arrange() masks dplyr::arrange(), plotly::arrange()
x lubridate::as.difftime() masks base::as.difftime()
x purrr::as_function() masks rlang::as_function()
x purrr::compact() masks plyr::compact()
x plyr::count() masks dplyr::count()
x lubridate::date() masks base::date()
x tidyr::expand() masks reshape::expand()
x tidyr::extract() masks magrittr::extract()
x plyr::failwith() masks dplyr::failwith()
x dplyr::filter() masks plotly::filter(), stats::filter()
x purrr::flatten() masks rlang::flatten()
x purrr::flatten_chr() masks rlang::flatten_chr()
x purrr::flatten_dbl() masks rlang::flatten_dbl()
x purrr::flatten_int() masks rlang::flatten_int()
x purrr::flatten_lgl() masks rlang::flatten_lgl()
x purrr::flatten_raw() masks rlang::flatten_raw()
x lubridate::here() masks plyr::here()
x plyr::id() masks dplyr::id()
x lubridate::intersect() masks base::intersect()
x purrr::invoke() masks rlang::invoke()
x lubridate::is.Date() masks arsenal::is.Date()
x dplyr::lag() masks stats::lag()
x purrr::list_along() masks rlang::list_along()
x purrr::modify() masks rlang::modify()
x plyr::mutate() masks ggpubr::mutate(), dplyr::mutate(), plotly::mutate()
x purrr::prepend() masks rlang::prepend()
x plyr::rename() masks reshape::rename(), dplyr::rename(), plotly::rename()
x MASS::select() masks dplyr::select(), plotly::select()
x purrr::set_names() masks magrittr::set_names(), rlang::set_names()
x lubridate::setdiff() masks base::setdiff()
x purrr::splice() masks rlang::splice()
x lubridate::stamp() masks reshape::stamp()
x plyr::summarise() masks dplyr::summarise(), plotly::summarise()
x plyr::summarize() masks dplyr::summarize()
x lubridate::union() masks base::union()
Warning message:
package ‘tidyverse’ was built under R version 3.6.3

I wouldnt think so, but the proof will be in the pudding when you rerun our code

Here you go, but still seeing the same warning msg.

> name <- c('Mary','Sue','Peter','Mary','Mary','John','Sue','Peter','Peter','John',
+           'John','John','Mary','Mary')
> date <- c('01/04/2018','03/02/2017','01/01/2019','24/04/2017',
+           '02/03/2019','31/05/2019','08/09/2019','17/12/2019',
+           '02/08/2017','10/11/2017','30/12/2017','18/02/2018',
+           '18/02/2018','18/10/2019')
> 
> data <- data.frame(name,
+                    date=lubridate::dmy(date))
> 
> # Still have unique name as the row and list the trip 1,2,3,4,5 and 
> # so on of each person and hv the avg trip n grand total at last column n row.
> data2 <- arrange(data,
+                  name,date) %>% 
+   group_by(name) %>% 
+   mutate(order=row_number())
Error: row_number() should only be called in a data context
Run `rlang::last_error()` to see where the error occurred.
> 
> times_df <- pivot_wider(data2,
+                         id_cols=name,
+                         names_from=order,
+                         names_prefix = "time_",
+                         values_from = "date") 
Error in tbl_vars_dispatch(x) : object 'data2' not found
> 
> times_df_lgl <- mutate_all(times_df,
+                            as.logical)
Error in is_grouped_df(tbl) : object 'times_df' not found
> times_df_lgl$num <- rowSums(times_df_lgl[names(times_df_lgl)[-1]],na.rm = TRUE)
Error in is.data.frame(x) : object 'times_df_lgl' not found
> (avg_times<-mean(times_df_lgl$num,na.rm = TRUE))
Error in mean(times_df_lgl$num, na.rm = TRUE) : 
  object 'times_df_lgl' not found
> (sum_times<-sum(times_df_lgl$num,na.rm = TRUE))
Error: object 'times_df_lgl' not found
> times_df$num<-times_df_lgl$num
Error: object 'times_df_lgl' not found
> jam_it_together <- mutate_all(times_df,
+                               as.character)
Error in is_grouped_df(tbl) : object 'times_df' not found
> jam_it_together <- bind_rows(jam_it_together,
+                              mutate_all(data.frame(
+                                name="Total",
+                                time_1=sum_times,
+                                time_2="Avg",
+                                time_3=avg_times),as.character))
Error in dots_values(...) : object 'jam_it_together' not found
> 
> # > jam_it_together
> # # A tibble: 5 x 7
> # # Groups:   name [5]
> # name  time_1     time_2     time_3     time_4     time_5     num  
> # <chr> <chr>      <chr>      <chr>      <chr>      <chr>      <chr>
> # 1 John  2017-11-10 2017-12-30 2018-02-18 2019-05-31 NA         4    
> # 2 Mary  2017-04-24 2018-02-18 2018-04-01 2019-03-02 2019-10-18 5    
> # 3 Peter 2017-08-02 2019-01-01 2019-12-17 NA         NA         3    
> # 4 Sue   2017-02-03 2019-09-08 NA         NA         NA         2    
> # 5 Total 14         Avg        3.5        NA         NA         NA 
> 
> # The second table I want to know the lag days between trips. Lag is the day between trips.
> data3 <- mutate(data2,
+                 lead_date=lead(date),
+                 days_between = lead_date-date
+ )
Error in is.data.frame(.data) : object 'data2' not found
> 
> data4 <-  pivot_wider(data3 %>% filter(!is.na(days_between)),
+                       id_cols=name,
+                       names_from=order,
+                       names_prefix = "lag_",
+                       values_from = "days_between") 
Error in eval(lhs, parent, parent) : object 'data3' not found
> #from here follow a similar approach to example 1
> library(tidyverse)
> library(lubridate)
> name <- c('Mary','Sue','Peter','Mary','Mary','John','Sue','Peter','Peter','John',
+           'John','John','Mary','Mary')
> date <- c('01/04/2018','03/02/2017','01/01/2019','24/04/2017',
+           '02/03/2019','31/05/2019','08/09/2019','17/12/2019',
+           '02/08/2017','10/11/2017','30/12/2017','18/02/2018',
+           '18/02/2018','18/10/2019')
> 
> data <- data.frame(name,
+                    date=lubridate::dmy(date))
> 
> # Still have unique name as the row and list the trip 1,2,3,4,5 and 
> # so on of each person and hv the avg trip n grand total at last column n row.
> data2 <- arrange(data,
+                  name,date) %>% 
+   group_by(name) %>% 
+   mutate(order=row_number())
Error: row_number() should only be called in a data context
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/rlang_error>
row_number() should only be called in a data context
Backtrace:
  1. plyr::arrange(data, name, date)
  1. dplyr::group_by(., name)
  9. plyr::mutate(., order = row_number())
 10. [ base::eval(...) ] with 1 more call
 12. dplyr::row_number()
 13. dplyr:::from_context("..group_size")
 14. `%||%`(...)
Run `rlang::last_trace()` to see the full context.
> rlang::last_trace()
<error/rlang_error>
row_number() should only be called in a data context
Backtrace:
     x
  1. \-arrange(data, name, date) %>% group_by(name) %>% mutate(order = row_number())
  2.   +-base::withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
  3.   \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
  4.     \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
  5.       \-`_fseq`(`_lhs`)
  6.         \-magrittr::freduce(value, `_function_list`)
  7.           +-base::withVisible(function_list[[k]](value))
  8.           \-function_list[[k]](value)
  9.             \-plyr::mutate(., order = row_number())
 10.               \-base::eval(cols[[col]], .data, parent.frame())
 11.                 \-base::eval(cols[[col]], .data, parent.frame())
 12.                   \-dplyr::row_number()
 13.                     \-dplyr:::from_context("..group_size")
 14.                       \-`%||%`(...)

could you please Restart R and run again.
sometimes R sessions get corrupted / one of the many libraries you had loaded was masking something improperly.
image

1 Like

Thank you very much. It works now. :grin: :grin: :grin: :grin: :+1: :+1: :+1:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.