Importing time (hh:mm:ss) data from Excel to R as minutes

Hello
I'll preface this with the fact that I am fairly new to R.

I have an Excel ark with data as hh:mm:ss that Excel automatically converts to a date: 33:32:00 to 01.01.1900 09:32:00. When importing to R right now, R returns it as a date. I would like to import the data to R so the data is showed in minutes: 33:32:00 would be (60*33)+32 = 2.012.

I have tried different settings in the data preview window to no prevail.

Thanks.

You can read it as text and convert it to minutes later, see this example

library(lubridate)

time <- "33:32:00"
time <- hms(time)
as.numeric(time)/60
#> [1] 2012

Created on 2020-03-08 by the reprex package (v0.3.0.9001)

If you need more specific help, please provide a proper REPRoducible EXample (reprex) illustrating your issue.

Hi @Traels -- what command are you using to import the Excel file?

Thank you for taking your time to answer me.

I was inaccurate in the description of my problem. The problem is for an entire column, not a single "date". Otherwise it seems like you've understood my problem.

In addition, is it possible to omit na values in the column?

library(readxl)
FER3 <- read_excel("til tvilling projekt sington frys minus cpr1.xlsx")

Actually, I understood it is for a column but you haven't provided a reprex or at least sample data so I just gave you an example that you can adapt to your issue. If you can't do it, as I said, provide a reprex so we can give you more specific help.

Ah, sorry. Here is my attempt at an reprex:

library(readxl)
Reprex <- read_excel("~/Desktop/Reprex.xlsx")

I hadn't gotten any further than what you suggested.

I hope this is sufficient. Thanks.

That is not a reprex, please read the guide I gave you and try to make one, chances are that even a failed try would be more useful than a screenshot

Sorry, here goes again:

library(readxl)
library(tidyverse)
Reprex <- read_excel("~/Desktop/Reprex.xlsx")
data.frame(
        ...1 = c(1, 2, 3, 4, 5, 6),
          t3 = c("1900-01-01 08:24:00", NA, NA, NA, NA, "1900-01-01 15:20:00"),
          t4 = c("1900-01-01 10:24:00", NA, NA, NA, NA, "1900-01-01 16:05:00"),
          t5 = c("1900-01-01 10:54:00", NA, NA, NA, NA, "1900-01-02 05:06:00"),
          t8 = c("1900-01-02 03:40:00", NA, NA, NA, NA, "1900-01-02 21:51:00")
)

'data.frame' is supposed to be 'Reprex'

Here's what I would suggest: First, save your Excel file as a csv file (see the 'File Format' menu in Excel). Then run the following:

df <- read_csv("~/Desktop/Reprex.xlsx", 
         col_types = cols(.default = col_character())
         )

Once you do this, could you post the contents of df?

This was a good first step, and you're sure to get a reprex in shape soon :slight_smile:

data.frame(
  stringsAsFactors = FALSE,
       check.names = FALSE,
    `;t3;t4;t5;t8` = c("1;32:24:00;34:24:00;34:54:00;51:40:00","2;;;;","3;;;;","4;;;;","5;;;;",
                       "6;39:20:00;40:05:00;53:06:00;69:51:00")

Thank you! :slight_smile:

I don't think this is the best thing to do because the data has not been correctly imported, it would be better to import them correctly from the beginning but just to exemplify, this would be a way to convert your sample data as it is to minutes.

library(tidyverse)
library(lubridate)

sample_df <- data.frame(
    stringsAsFactors = FALSE,
    check.names = FALSE,
    `;t3;t4;t5;t8` = c("1;32:24:00;34:24:00;34:54:00;51:40:00","2;;;;","3;;;;","4;;;;","5;;;;",
                       "6;39:20:00;40:05:00;53:06:00;69:51:00"))

sample_df %>% 
    separate(1, into = c("row", "t3", "t4", "t5", "t8"), sep = ";") %>% 
    select(-row) %>% 
    mutate_all(~as.numeric(hms(.)) / 60)
#>     t3   t4   t5   t8
#> 1 1944 2064 2094 3100
#> 2   NA   NA   NA   NA
#> 3   NA   NA   NA   NA
#> 4   NA   NA   NA   NA
#> 5   NA   NA   NA   NA
#> 6 2360 2405 3186 4191

I think we could give you a better solution if you post a link to a sample .xlsx file hosted on any cloud storage service (e.g. dropbox, google cloud, box, etc).

I think the problem is that the data in the Excel file is in the custom format '[h]:mm:ss', which doesn't seem to be easily read by read_excel, which can only parse dates from Excel.

Here's solution that uses read_excel() directly, but takes care of assigning the correct data types:

library(tidyverse)
library(readxl)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date

# assuming below is result of:
#  df <- read_excel([your excel file])
#  dput(df)

df <- 
  structure(list(...1 = c(1, 2, 3, 4, 5, 6), t3 = structure(c(-2208958560, 
                                                              NA, NA, NA, NA, -2208958560), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
                 t4 = structure(c(-2208958560, NA, NA, NA, NA, -2208958560
                 ), class = c("POSIXct", "POSIXt"), tzone = "UTC"), t5 = structure(c(-2208958560, 
                                                                                     NA, NA, NA, NA, -2208958560), class = c("POSIXct", "POSIXt"
                                                                                     ), tzone = "UTC"), t8 = structure(c(-2208958560, NA, NA, 
                                                                                                                         NA, NA, -2208958560), class = c("POSIXct", "POSIXt"), tzone = "UTC")), row.names = c(NA, 
                                                                                                                                                                                                              -6L), class = c("tbl_df", "tbl", "data.frame"))
### end of 'structure()' call

# save the date Excel inserts for custom [h]:mm:ss format
xl.origin <- '1900-01-01' %>% as_datetime()

# subtract xl.origin from column values and convert to duration
df %>% 
  # give first column a proper name
  rename(row = ...1) %>% 
  mutate_at(
    .vars = vars(t3:t8),
    .funs = function(column) {(column - xl.origin) %>% as.duration()}
    )
#> # A tibble: 6 x 5
#>     row t3                  t4                  t5                 
#>   <dbl> <Duration>          <Duration>          <Duration>         
#> 1     1 30240s (~8.4 hours) 30240s (~8.4 hours) 30240s (~8.4 hours)
#> 2     2 NA                  NA                  NA                 
#> 3     3 NA                  NA                  NA                 
#> 4     4 NA                  NA                  NA                 
#> 5     5 NA                  NA                  NA                 
#> 6     6 30240s (~8.4 hours) 30240s (~8.4 hours) 30240s (~8.4 hours)
#> # … with 1 more variable: t8 <Duration>

# convert to numeric columns representing minutes, if desired
df %>% 
  # give first column a proper name
  rename(row = ...1) %>% 
  mutate_at(
    .vars = vars(t3:t8),
    .funs = function(column) {(column - xl.origin) %>% as.duration()}
  ) %>% 
  mutate_all(~ as.numeric(., 'minutes'))
#> # A tibble: 6 x 5
#>     row    t3    t4    t5    t8
#>   <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1   504   504   504   504
#> 2     2    NA    NA    NA    NA
#> 3     3    NA    NA    NA    NA
#> 4     4    NA    NA    NA    NA
#> 5     5    NA    NA    NA    NA
#> 6     6   504   504   504   504

Created on 2020-03-08 by the reprex package (v0.3.0)
Does this code make sense to you, @Traels?

@andresrcs, @Traels: I just realized I forgot to include that Excel assigns the hours in excess of 24 to the date-time format it produces, so that 32:00:00 becomes 1901-01-01 8:00:00 AM.

Here's the corrected version:

library(tidyverse)
library(readxl)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date

# assuming below is result of:
#  df <- read_excel([your excel file])
#  dput(df)

df <- 
  structure(list(...1 = c(1, 2, 3, 4, 5, 6), t3 = structure(c(-2208958560, 
                                                              NA, NA, NA, NA, -2208958560), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
                 t4 = structure(c(-2208958560, NA, NA, NA, NA, -2208958560
                 ), class = c("POSIXct", "POSIXt"), tzone = "UTC"), t5 = structure(c(-2208958560, 
                                                                                     NA, NA, NA, NA, -2208958560), class = c("POSIXct", "POSIXt"
                                                                                     ), tzone = "UTC"), t8 = structure(c(-2208958560, NA, NA, 
                                                                                                                         NA, NA, -2208958560), class = c("POSIXct", "POSIXt"), tzone = "UTC")), row.names = c(NA, 
                                                                                                                                                                                                              -6L), class = c("tbl_df", "tbl", "data.frame"))
### end of 'structure()' call

# save the date Excel inserts for custom [h]:mm:ss format
xl.origin <- '1900-01-01' %>% as_datetime()

# subtract xl.origin from column values and convert to duration
duration.df <- 
  df %>% 
  # give first column a proper name
  rename(row = ...1) %>% 
  # subtract origin from all the date-time columns and convert
  mutate_at(
    .vars = vars(t3:t8),
    .funs = function(column) {(column - xl.origin) %>% as.duration()}
    ) %>% 
  # fix Excel quirk of assigning time in excess of 24 hours
  mutate_at(
    .vars = vars(t3:t8),
    .funs = function(column) {column + dhours(24)}
  )

# inspect duration.df
duration.df
#> # A tibble: 6 x 5
#>     row t3                   t4                   t5                  
#>   <dbl> <Duration>           <Duration>           <Duration>          
#> 1     1 116640s (~1.35 days) 116640s (~1.35 days) 116640s (~1.35 days)
#> 2     2 NA                   NA                   NA                  
#> 3     3 NA                   NA                   NA                  
#> 4     4 NA                   NA                   NA                  
#> 5     5 NA                   NA                   NA                  
#> 6     6 116640s (~1.35 days) 116640s (~1.35 days) 116640s (~1.35 days)
#> # … with 1 more variable: t8 <Duration>

# convert to numeric columns representing minutes, if desired
duration.df %>% 
  mutate_all(~ as.numeric(., 'minutes'))
#> # A tibble: 6 x 5
#>     row    t3    t4    t5    t8
#>   <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1  1944  1944  1944  1944
#> 2     2    NA    NA    NA    NA
#> 3     3    NA    NA    NA    NA
#> 4     4    NA    NA    NA    NA
#> 5     5    NA    NA    NA    NA
#> 6     6  1944  1944  1944  1944

Created on 2020-03-08 by the reprex package (v0.3.0)

I used the corrected version and it works a charm. Thanks a ton for the help!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.