Force reading of dates from Excel as character strings

Hi,

We process a lot of Excel files and currently do that using Excel macros. I'm looking at converting some of our processes to R as it should make them much more efficient.

The problem I am having is that the nature of our data is that the date column can contain a wide variety of data types. Some examples are simply a date (formatted as dd/mm/ccyy), a year (four digit integer), date ranges (e.g. 20/05/2020 - 07/07/2020), text descriptions ("Summer 2018"), ranges as text ("Mar 1978 - Jan 1979"), year ranges ("1935-1937").

Any of our data files may have dates in a single format or in various formats.

Ultimately we store the "Date" as a string in our database.

I'm trying to force R to treat the date column as text but all my attempts so far have resulted in dttm values or a conversion to an integer. I've tried various different packages including readxl, xlsx, gdata, openxlsx and I've tried specifying the column types on import.

I've tried creating an empty dataframe with the columns defined as character & reading into that but it converts the column when adding the data.

I've also tried converting the date column from a date format to character format, but can't get that to work reliably either.

I could save the Excel files as CSV and try processing them like that but I'd rather avoid extra steps in Excel if I can.

Can anyone help? Is there a particular package that it would be best to persevere with? Despite spending all morning searching I haven't found any examples of anyone trying to do that same thing (most people seem to have character dates that that want to convert to date formats).

Thanks!

You can use the col_types argument in readxl::read_excel

Either NULL to guess all from the spreadsheet or a character vector containing one entry per column from these options: "skip", "guess", "logical", "numeric", "date", "text" or "list". If exactly one col_type is specified, it will be recycled.

If you have tried this and it is not working, please can you show the code you have used.

The code is...

library(readxl)
XL_data <- read_excel("I:\\ERIC\\R\\Test data\\ZBamburgh plant list 26 Aug 2020.xlsx",col_types = "text")

This input file has a date of 26/08/2020 in all rows. The resulting dataframe does have the Date column as character but converts the value to 44069.

I would like it to contain the string "26/08/2020".

Excel is storing the date as a number (days since 1st January 1900), and passing that number to R. Might be easiest to import as a date and then coerce to a character vector.

Something like

df <- read_excel("myfile.xlsx")
df <- df %>% mutate(Date = as.character(Date))

Thanks, I'll have a play with that. I think the files we have with mixed date formats might be an issue.
I'll try to post a snippet of code once I get a reasonable solution.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.