Importing Dates formatted as integers in R

I want to convert column types from integer to date after import of a XLSX data-frame using read_xlsx, where columns of type date have been been wrongly guessed as dbl (the data-frame also contains several other columns that should not be converted).
Doing this during import using the col_type argument does not work, because this would subsetting the data in the col_type argument, which is not possible (as far as I see - Import dates from XLSX).

I found this thread Using across to convert column types, but it unfortunately does not work - not even as described there.

Because it is conversion from Excel dates stored as dbl, using the as_date function of lubridate the correct origin using the origin argument has to be set.

I get an error across() must only be used inside dplyr verbs. with this code:

read_xlsx("dataframe.xlsx") %>% 
  mutate(across(column3:column6 & where(is.double), as_date(., origin = "1899-12-30")))

However, this syntax works with "simple" conversion like:

read_xlsx("dataframe.xlsx") %>% 
  mutate(across(column3:column6 & where(is.character), as.double))

Of course, it could be done separately for each column:

read_xlsx("dataframe.xlsx") %>% 
  mutate(column3 = as_date(column3, origin = "1899-12-30")) %>% 
  mutate(column4 = as_date(column4, origin = "1899-12-30"))
  ... and so forth...

But this is rather unhandy, in addition since input column type has to be checked/corrected for every column, and not applicable to large datasets.

How can I fix this problem and convert from dbl to date using the across- and as_date-funtions?
Or are there other "handy" possibilities for conversion to date-type?

1 Like

Hello,

Here is a way to do exactly what you want. As you can see we only mutate when we find numeric columns and then we do a specific as date conversion. You will be able to adjust it for your specific conditions.

library(tidyverse)

df <- data.frame( dat = c(20200101,20200113,20200213),
                  a = c("h","g","b"),
                  idat = c(20200101,20200113,20200213))


output_df <- 
df  %>%
  mutate_if(is.numeric, ~ as.Date(as.character(.), format = '%Y%m%d'))

output_df
#>          dat a       idat
#> 1 2020-01-01 h 2020-01-01
#> 2 2020-01-13 g 2020-01-13
#> 3 2020-02-13 b 2020-02-13

Created on 2021-10-26 by the reprex package (v2.0.1)

1 Like

I think for the . to have your desired meaning within a mutate across it must be preceeded by a tilde ~

No, I already tried with ~ and ~. - and just checked once more - doesn't work

Unfortunately this doesn’t do what I want, since the dates imported from XLSX are in Windows date formatting, i.e. as number of days from origin which for Win is "1899-12-30".

This is a sample of how the dates are imported:

KoopArt Beginn Ende

1 KV 43221 44316
2 V 40436 41167
3 KV 42178 42369
4 V 43101 43281
5 ZV 43739 44469
6 V 43160 44286
7 KV 43070 43465
8 ZV 43831 44469
9 ZV 43739 44469
10 V 43374 43951

Can you give 5 rows of data or something so we can look at it properly? I am not sure if we can just detect numbers and convert on your data. We need more context...

df1 <- data.frame(
  stringsAsFactors = FALSE,
  KoopArt = c("KV", "V", "KV", "V", "ZV", "V", "KV", "ZV", "ZV", "V"),
  Beginn = c(43221L,40436L,42178L,43101L,
             43739L,43160L,43070L,43831L,43739L,43374L),
  Ende = c(44316L,41167L,42369L,43281L,
           44469L,44286L,43465L,44469L,44469L,43951L)
)

library(tidyverse)

mutate(df1,
       across(where(is.integer), 
              ~ as.Date(., origin = "1899-12-30")))
1 Like

Unfortunately I don't know how to attach as sample XLSX file

Perfect - this works :smiley:
Thank you

I use read_excel with a wrapper and coerce columns to given classes using its col_types functionality. Has not been a problem since then and means very little extra code

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.