Unable to create a new column (ride_length) using two date columns(Started_at),(Ended_at).

Currently I'm working on Divvy dataset Google capstone project. I am a novice in this field.
This Dataset contains data about the ride. I have combined 12 files of previous year to make a huge dataset.

Masterfile <- read.csv('Masterfile.csv')

I'm struggling to form a new column 'ride_length' which i can get by subtracting data of column "Started_at" and "Ended_it".
These columns contains date "02-07-21 14:44" and "02-07-21 15:19" in this format.
I even converted both the columns in Numeric.

***Masterfile$ended_at <- as.numeric(Masterfile$ended_at, origin = "02-07-21 15:19")
Masterfile$started_at <- as.numeric(Masterfile$started_at , origin = "02-07-21 14:44") ***

When i Didn't supply origin I was getting this error-

"Error in as.POSIXct.numeric(time1) : 'origin' must be supplied"

But when I did , I got the following error-

Error in difftime(Masterfile$ended_at, Masterfile$ended_at, origin = "02-07-21 15:19") :

I might be doing silly mistake kindly help me . I am just stuck.

But I am unable to do it.

You can leave the columns in character format and use difftime().

d = data.frame(Started_at = '02-07-21 14:44',
               Ended_at = '02-07-21 15:19')

d$ride_length = difftime(d$Ended_at, d$Started_at)

d
#>       Started_at       Ended_at ride_length
#> 1 02-07-21 14:44 02-07-21 15:19     35 mins

Created on 2022-09-22 with reprex v2.0.2.9000

This will only be the case if all start/end pairs are within the same day since the actual date is not being parsed (the dates are being parsed as %y%m%d), I think this is a dangerous approach since it can lead to unexpected results, for example:

difftime('02-08-21 00:19', '02-07-21 23:44')
#> Time difference of 30.02431 days

Created on 2022-09-22 with reprex v2.0.2

I think it is better to properly parse the character strings into date-time objects respecting the actual date format they are in (I'm assuming "%m-%d-%y).

library(dplyr)

d = data.frame(Started_at = c('02-07-21 14:44', '02-07-21 23:44'),
               Ended_at = c('02-07-21 15:19', '02-08-21 00:19'))

d %>% 
    mutate(across(c(Started_at, Ended_at), as.POSIXct, format = "%m-%d-%y %H:%M"),
           ride_length = difftime(Ended_at, Started_at))
#>            Started_at            Ended_at ride_length
#> 1 2021-02-07 14:44:00 2021-02-07 15:19:00     35 mins
#> 2 2021-02-07 23:44:00 2021-02-08 00:19:00     35 mins

Created on 2022-09-22 with reprex v2.0.2

1 Like

I have tried the solution. It is not working.

Masterfile$ridelength <- difftime(Masterfile$ended_at , Masterfile$started_at)

I'm getting the following error:

Error in as.POSIXlt.character(as.character(x), ...) :
character string is not in a standard unambiguous format

See @andresrcs solution, which specifically sets the inputs as date-time objects and specifies the exact format prior to using difftime().

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.