37:30:55 Excel format compatible in RStudio???

Hi All,

I have an Excel file (now CSV) that I'm importing to RStudio, however there is a column that's formatted as 37:30:55 that is not recognized in RStudio. For those that don't know what 37:30:55 is, I believe it is a format to calculate time duration over 24hrs. I think RStudio is recognizing it as time and not duration.

The column itself is the duration of a rental in hours. I start to see issues when hours go over 24 hours.

Here is a screenshot of what is happening.
Screen Shot 2023-08-11 at 1.33.53 PM

The actual variable is typeof() character. Assuming that it is in the form hours:minutes:seconds and if you are interested in finding those rows with more than 24 hours

input <- "37:30:55"
revised <- as.numeric(gsub(":.*$","",input))
bigger <- function(x) x > 24
bigger(revised)
#> [1] TRUE

Created on 2023-08-11 with reprex v2.0.2

1 Like

I'm not sure is this is responsive to your question, but as @technocrat points out actual comes in as a character. You can use the lubridate package to convert this to hours minutes and seconds. Something like

library(lubridate)
result <- hms(actual)
howManyHours <- hours(result)
1 Like

Hi Technocrat,

Once again, thank you for your input. However I might be a little lost. My knowledge of R is still in its infancy and your solution may be beyond the scope of my knowledge. I can barely understand the R script you wrote.

I'm not interested in finding the rows > 24hrs. I think the issue is that R is finding a problem with rows greater than 24 hrs because of the format and being unable to read and load the data with read_csv.

I tried to view columns with functions like head & glimpse but because of the problems nothing is shown except the name of the original CSV files.

Should or can this column be edited during the read_csv function?

Hi Startz,

Thanks for your response. Are you suggesting that I change type for column 14 after I read_csv to hms(actual). I'm not trying find a certain amount of hours, I'm just trying to include all the data. I might just have to change the format of Column 14 before or after but don't know how to do that.

Why don't you post the first dozen rows of your Excel file. Maybe someone can spot something funny. Also, the exact command you used to read it in.

revised will give you the number of hours, neglecting minutes and seconds. If you are looking to round, come back.

I don't think hour is working. I get

"135055H 0M 0S"

My suggestion would be to convert time value result into a numeric value expressing the time in seconds. Then @ Gmillan1219 can do numeric calculations on it

library(lubridate)
result  <- hms("37:30:55")
timm  <- period_to_seconds(hms(result)) 

Thus

`timm / 60` gives us rental time in minutes or `tim / (60 * 60)` wil give rental time in hours

@jrkrideau is right. I should have written hour() rather than hours().

library(lubridate)
hour(hms("37:30:55"))
[1] 37

Hey everyone,

I've encountered this very issue before. It seems like the column in your Excel file is displaying the duration in a "days:hours:minutes" format, which is quite typical in Excel for durations longer than 24 hours. However, when imported into RStudio, it's being read as a time, which can cause some confusion.

Here's a potential way to handle this in R:

Import the CSV file as usual. Let's say the problematic column is called "Duration."
Use the strsplit function to split the column into three separate columns (days, hours, and minutes).
Convert these columns into numeric and then combine them into a single duration column in hours.
Here's a bit of code that might help:

Assuming df is your data frame and Duration is the column

duration_split <- strsplit(as.character(df$Duration), ":")
df$Days <- sapply(duration_split, [, 1)
df$Hours <- sapply(duration_split, [, 2)
df$Minutes <- sapply(duration_split, [, 3)

Convert to numeric

df$Days <- as.numeric(df$Days)
df$Hours <- as.numeric(df$Hours)
df$Minutes <- as.numeric(df$Minutes) / 60 # convert minutes to fraction of hour

Combine into single duration column in hours

df$TotalHours <- df$Days * 24 + df$Hours + df$Minutes

Optional: remove the split columns

df$Days <- NULL
df$Hours <- NULL
df$Minutes <- NULL

This way, TotalHours should now have the correct duration in hours. Do give it a try and see if it helps.

Cheers and happy coding!
Ahmad

HI Startz,

The exact command I'm using to read the data frame is:

read_csv(case_study_1)

case_study_1 is a data frame with 12 CSV files assigned to it representing the data for 12 months.

Below is a screen shot of one of the CSV files. The column I am having issues with is column N. Column N represents the time difference between columns D & C. The data is about the time length of rented bicycles.
Some of the time differences exceed 24 hours and could be 108:36:04 (which can be seen in the very first screenshot of this post) meaning 108 hours 36 minutes and 4 seconds. So I believe that once it exceeds 23:59:59 RStudio can't interpret it because its working with a 24 hour time span.

Hi fawadalam4514,

This may be a silly question but can I do this even though the data had problems importing? I'm very new to RStudio and not sure if data could be manipulated even after a failed read_csv.

Also, the data is hh:mm:ss its just that when the hours go over 23:59:59 RStudio cannot interpret it because its working with a 24 hr time span.

Anyway I'm going to try your recommendations. Thanks once again

I created a little csv file

a,b,ride_length
7/10/2011 22:55,7/10/2022 23:01,108:36:04

and read it in with ```zz<-read_csv("foo.csv"). It read in fine and I got

> hour(hms(zz$ride_length))
[1] 108

which suggests the problem is something else...but doesn't help much saying what the problem is.

I second @ startz.

I think the problem is in the files to be imported, not in what you are doing. But what is another matter. Can you link us to the files?

Hi Jrkideau,

Heres the link: Bucket loading...

I'm working with the August 5th 2022 files to July 2023 files which is 12 months. I provided a screenshot below.

Hi Startz,

You may be right. I took one of the files I was working with and provided 4 different time lengths; 3 below 24 hours and 1 above it imported with no problems.

Screen Shot 2023-08-15 at 5.36.36 PM

Can you say which file gives you a problem?

Hi Startz,

I haven't tried them all individually yet but I did try the first two individually. Which are August 5th, 2022 (contains July 2022 data) & September 8th, 2022 (contains August 2022 data).

After trying them I got the same problem.

I just imported both of those files with no problem.

Maybe you could post the exact code that causes a problem together with the error message.

Hi Startz,

Thanks for all the help.

I'm using
> july <- ("July 2022.csv")
> read_csv(july)

But I think i found what may be the problem.

I looked at one of the problem rows in July 2022 and found there was a date in the column 14. In the formula bar it says 1/7/1900 while there aren't dates in the others. I think times over 23:59:59 might be including dates. I did format the whole column as 37:30:55