Not a definitive answer, but perhaps some indication:
dfTest <- data.frame(StringDate=c("2020-12-01","2020-12-02"),
DateDate=as.Date(c("2020-12-01","2020-12-02")))
dfTest
#> StringDate DateDate
#> 1 2020-12-01 2020-12-01
#> 2 2020-12-02 2020-12-02
sc <- dbplyr::src_memdb()
sdfTest <- dplyr::copy_to(sc, dfTest)
sdfTest
#> # Source: table<dfTest> [?? x 2]
#> # Database: sqlite 3.33.0 [:memory:]
#> StringDate DateDate
#> <chr> <dbl>
#> 1 2020-12-01 18597
#> 2 2020-12-02 18598
unclass(dfTest$DateDate)
#> [1] 18597 18598
Created on 2020-12-17 by the reprex package (v0.3.0)
So my best guess would be that R actually stores dates as doubles counting the number of days since 1970-01-01 (cf ?Date
), as long as you stay in R that's all fine. But dplyr::copy_to()
sends the underlying double representation, and how it is handled depends on what's on the other end. For example sqlite just stores the double (sqlite does not have a proper date format).
And in the case of your Spark server, something went wrong, possibly the server does not have the same locale as your computer, or Spark believes it's a number of days since 1970 starting at 0? You can experiment and see the result of as.Date("1970-01-01")
to see if it's a systematic error.