Columns being read in by read_excel as a date instead of numeric

I have used read_excel from to read in a file. It picks up one of the columns as a date, instead of numeric. Is there a way of fixing this? I am aware of col_types within read_excel, but there are 39 columns, so am not going to enter them all.

read_in <- structure(list(read_in = structure(c(-2201817600, 
-2206915200, NA, -2204755200, -2200953600, -2200521600, -2205619200, 
-2204755200), tzone = "UTC", class = c("POSIXct", "POSIXt"
))), row.names = c(NA, -8L), class = c("tbl_df", "tbl", "data.frame"
))

actual <- c(85, 25, NA, 50, 95, 100, 40, 50)

converted <- read_in %>% 
  mutate(decrease = as.numeric(decrease))

df <- tibble(read_in, actual) %>% 
  mutate(converted = as.numeric(read_in)/100)

The converted number doesn't make sense and doesn't work (even after trying to add various numbers to it - not shown here):

# A tibble: 8 x 3
  read_in             actual converted
  <dttm>               <dbl>     <dbl>
1 1900-03-25 00:00:00     85 -22018176
2 1900-01-25 00:00:00     25 -22069152
3 NA                      NA        NA
4 1900-02-19 00:00:00     50 -22047552
5 1900-04-04 00:00:00     95 -22009536
6 1900-04-09 00:00:00    100 -22005216
7 1900-02-09 00:00:00     40 -22056192
8 1900-02-19 00:00:00     50 -22047552

The origins seem to be different too:

as.Date(85, origin = "1899-12-30")
as.Date(25, origin = "1899-12-31")
as.Date(50, origin = "1899-12-31")

How can this be fixed?

I am not sure I understand what is in the excel file. Does the column contain, for example, the number 85 and it gets interpreted as a Date?

The difference in origin dates is due, I think, to Excel treating 1900-02-29 as a valid date though it is not. Any date after 1900-02-28 therefore has a value one larger than it should and the origin has to be set to 1899-12-30 for R to translate it correctly.

You can make a a vector for the col_types argument fairly easily with the rep() function. If the problematic column is the 23 column, you could use

ColType <- c(rep("guess", 22), "numeric", rep("guess", 16))

and pass that to col_types.

Yes, it is the number 85 in Excel and gets read in as a date when using read_excel.

Thanks for the tip about rep() and col_types.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.