Conditional date difference grouped by patient ID

Hi there, I've been struggling with what I thought would be a simple calculation but has proven to be out of my basic skill set perhaps. I have looked at many other examples of a similar problem but I haven't been able to make any of them work, so here goes:

I have a dataset (about 504 rows) consisting of multiple visits per patient. each visit date has been documented and I need to find out the time interval between each visit date, per patient. I'd like to document this in a new column "visit_diff" which will show the interval between visit 1 and 2, visit 2 and 3, etc. By my logic, the row in visit_diff corresponding to visit 1 should equal NA, as some patients have only come for one visit.

I've tried to group_by patient id but get the error: "...must be length 2 (the group size) or one, not 504".
I've tried using an ifelse statement but couldn't get the condition calculation worded correctly (if the patient id is equal to the one above it then subtract the visit date with a lag of 1).

> df
# A tibble: 504 x 4
# Groups:   pd_id [212]
      pd_id visit                     visit_diff visit_date
      <dbl> <ord>                  <dbl> <date>    
 1 10200001 visit 1                   NA 2018-04-10
 2 10200001 visit 2                   NA 2018-06-18
 3 10200002 visit 1                   NA 2018-04-17
 4 10200002 visit 2                   NA 2018-06-21
 5 10200002 visit 3                   NA 2019-03-08
 6 10200002 visit 4                   NA 2019-07-05
 7 10200003 visit 1                   NA 2018-04-19
 8 10200003 visit 2                   NA 2018-06-19
 9 10200003 visit 3                   NA 2019-04-10
10 10200004 visit 1                  NA 2018-04-19
# ... with 494 more rows

I hope this is reprex friendly.
Thanks in advance!

Is this what you are looking for?

library(dplyr)

df <- read.csv("/home/fjcc/R/Play/patients.csv")
df <- df %>% mutate(Date = as.Date(Date))
df2 <- df %>% group_by(id) %>% 
  mutate(LagDate = lag(Date), Diff = Date - LagDate)
df2
#> # A tibble: 10 x 5
#> # Groups:   id [4]
#>          id Visit      Date       LagDate    Diff    
#>       <int> <fct>      <date>     <date>     <time>  
#>  1 10200001 " visit 1" 2018-04-10 NA          NA days
#>  2 10200001 " visit 2" 2018-06-18 2018-04-10  69 days
#>  3 10200002 " visit 1" 2018-04-17 NA          NA days
#>  4 10200002 " visit 2" 2018-06-21 2018-04-17  65 days
#>  5 10200002 " visit 3" 2019-03-08 2018-06-21 260 days
#>  6 10200002 " visit 4" 2019-07-05 2019-03-08 119 days
#>  7 10200003 " visit 1" 2018-04-19 NA          NA days
#>  8 10200003 " visit 2" 2018-06-19 2018-04-19  61 days
#>  9 10200003 " visit 3" 2019-04-10 2018-06-19 295 days
#> 10 10200004 " visit 1" 2018-04-19 NA          NA days

Created on 2019-08-12 by the reprex package (v0.2.1)

2 Likes

YES - thank you!!!:smiley:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.