Changing dates to num days

soysauce93 · February 22, 2023, 10:08am

Hi. I am trying to do a regression on some data but I only have the timepoints in date format. I would like to change them to day number but have no idea how. What seems to be complicating things is that I have separate rows for each data point. I.e. here rows 1-2 refer to the same patient (SID 28) but 3 different results. How can I change the "date" column here to day number with respect to the first row for each SID?

          SID Date                                 <chr>                     <chr>     <dbl> <chr>                       
 1         28 2019-09-12 00:00:00    1                         NO         2.07 NO                          
 2         28 2018-09-13 00:00:00    0                         NO         2.63 NO                          
 3         28 2022-02-24 00:00:00    0                         NO         2.25 NO                          
 4         52 2021-04-22 00:00:00    0                         NO         1.38 NO                          
 5         52 2020-10-01 00:00:00    1                         NO        NA    NO                          
 6         52 2019-09-26 00:00:00    1                         NO         1.56 NO                          
 7         52 2018-09-27 00:00:00    0                         NO         1.65 NO                          
 8         52 2022-05-19 00:00:00    0                         Yes       NA    Yes                         
 9         52 2023-01-05 00:00:00    1                         NO         1.26 NO                          
10         88 2019-01-03 00:00:00    3                         NO        NA    NO

jrkrideau · February 22, 2023, 11:32am

Hi welcome to the forum.

I don't understand the question, possibly because I do not understand the structure of your data.

Would you supply it in an easily usable format? A handy way to supply some sample data is the dput() function. In the case of a large dataset something like dput(head(mydata, 100)) should supply the data we need. Just do dput(mydata) where mydata is your data. Copy the output and paste it here.

Thanks

andresrcs · February 22, 2023, 7:49pm

If I understand you correctly, it would be something like this:

library(dplyr)

your_data_frame %>%
    group_by(SID) %>% 
    mutate(day_number = row_number())

If you need more specific help, please provide a proper REPRoducible EXample (reprex) illustrating your issue.

technocrat · February 28, 2023, 8:08am

I would do it this way, to get successive observations to keep multiple observations for each patient unless all but the first were spurious.

library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
d <- data.frame(
  SID = c(28, 28, 52, 52, 52, 52, 52, 52, 88),
  Date = c("2018-09-13 00:00:00", "2022-02-24 00:00:00", 
           "2021-04-22 00:00:00", "2020-10-01 00:00:00", 
           "2019-09-26 00:00:00", "2018-09-27 00:00:00", 
           "2022-05-19 00:00:00", "2023-01-05 00:00:00", 
           "2019-01-03 00:00:00"),
  W = c(0, 0, 0, 1, 1, 0, 0, 1, 3),
  X = c(FALSE, FALSE, FALSE, FALSE, FALSE, 
        FALSE, TRUE, FALSE, FALSE),
  Y = c(2.63, 2.25, 1.38, NA, 1.56, 1.65, NA, 1.26, NA),
  Z = c(FALSE, FALSE, FALSE, FALSE, FALSE, 
        FALSE, TRUE, FALSE, FALSE))

d[2] <- ymd_hms(d[2][[1]])
begin <- min(d[2][[1]])

d$dayno <- difftime(d[2][[1]],begin, units = "days") 
attributes(d$dayno) <- NULL
d <- d[order(d$dayno),]
d <- d[c(7,1:6)]
d
#>   dayno SID       Date W     X    Y     Z
#> 1     0  28 2018-09-13 0 FALSE 2.63 FALSE
#> 6    14  52 2018-09-27 0 FALSE 1.65 FALSE
#> 9   112  88 2019-01-03 3 FALSE   NA FALSE
#> 5   378  52 2019-09-26 1 FALSE 1.56 FALSE
#> 4   749  52 2020-10-01 1 FALSE   NA FALSE
#> 3   952  52 2021-04-22 0 FALSE 1.38 FALSE
#> 2  1260  28 2022-02-24 0 FALSE 2.25 FALSE
#> 7  1344  52 2022-05-19 0  TRUE   NA  TRUE
#> 8  1575  52 2023-01-05 1 FALSE 1.26 FALSE

^{Created on 2023-02-28 with reprex v2.0.2}

nirgrahamuk · February 28, 2023, 9:33am

I would recommend the following approach.
There is the option to order the dates, so as to avoid negatives for when for a SID subsequent dates are before (rather than after) the initial date for the SID.

d <- data.frame(
  SID = c(28, 28, 52, 52, 52, 52, 52, 52, 88),
  Date = c(
    "2018-09-13 00:00:00", "2022-02-24 00:00:00",
    "2021-04-22 00:00:00", "2020-10-01 00:00:00",
    "2019-09-26 00:00:00", "2018-09-27 00:00:00",
    "2022-05-19 00:00:00", "2023-01-05 00:00:00",
    "2019-01-03 00:00:00"
  ),
  W = c(0, 0, 0, 1, 1, 0, 0, 1, 3),
  X = c(
    FALSE, FALSE, FALSE, FALSE, FALSE,
    FALSE, TRUE, FALSE, FALSE
  ),
  Y = c(2.63, 2.25, 1.38, NA, 1.56, 1.65, NA, 1.26, NA),
  Z = c(
    FALSE, FALSE, FALSE, FALSE, FALSE,
    FALSE, TRUE, FALSE, FALSE
  )
)
library(tidyverse)
d |> #arrange(SID,Date) |> 
  group_by(SID) |>
  mutate(
    days_since_first =
      as.numeric(difftime(
        Date, first(Date),
        units = "day"
      ))
  )

uncomment the #arrange() in order to get that behaviour

system · April 11, 2023, 9:33am

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.