Writing an IF AND code based on 4 different columns with multiple identical rows.

Hi,

Welcome to the RStudio community!

Tip, when you post, try to generate a reprex: A reprex consists of the minimal code and data needed to recreate the issue/question you're having. You can find instructions how to build and share one here:

If I understand your questions correctly, what you want can be solved by some functions from the tidyverse package (especially the lag / lead ones).

library(tidyverse)

myData = data.frame(
  ID = rep(1:2, each = 3),
  Names = LETTERS[1:3],
  Start_Date = as.Date(c("20-4-2021", "21-4-2021", "22-5-2021"), format = "%d-%m-%Y"),
  End_Date = as.Date(c("20-5-2021", "22-5-2021", "25-5-2021"), format = "%d-%m-%Y")
)

myData = myData %>% 
  group_by(ID) %>% 
  mutate(
    timeDiff = difftime(lag(Start_Date), End_Date, units = "days")
  )

myData
#> # A tibble: 6 x 5
#> # Groups:   ID [2]
#>      ID Names Start_Date End_Date   timeDiff
#>   <int> <chr> <date>     <date>     <drtn>  
#> 1     1 A     2021-04-20 2021-05-20  NA days
#> 2     1 B     2021-04-21 2021-05-22 -32 days
#> 3     1 C     2021-05-22 2021-05-25 -34 days
#> 4     2 A     2021-04-20 2021-05-20  NA days
#> 5     2 B     2021-04-21 2021-05-22 -32 days
#> 6     2 C     2021-05-22 2021-05-25 -34 days

Created on 2021-04-21 by the reprex package (v2.0.0)

  • The group_by function makes sure we only evaluate the data per ID (I created an ID 2 to show this)
  • I assume you always want to make the difference between the current and previous row, so this can be done using the lag function which shifts data of certain columns by any amount (default 1). Not the NA values where there is no previous row.
  • The difftime function actually calculated the different in days

If you like to learn more about the Tidyverse, check this link

Hope this helps,
PJ