Hi, the problem is that to_date() does not have a "format" options, and I believe as.Date() simply wraps to_date() so there is no advantage to using the UDF directly. Because of the lack of being able to pass the format, we need to pass the string date exactly as to_date() expects, which is YYYY-M-D. To do so, we would need to manipulate the string first, by repositioning the date parts, and then pass it through the string-to-date converter. In the example below, I'm assuming that the first date part is month and not day. I'm using the split() UDF to perform an operation similar to strsplit() and thus easily separate the date parts:
library(sparklyr)
sc <- spark_connect("local")
test_df <- data.frame(x = 1, y = "1/1/2021")
test_tbl <- copy_to(sc, test_df, overwrite = TRUE)
test_tbl %>%
mutate(
y2 = split(y, "/"),
new_y = paste0(y2[[2]], "-", y2[[0]], "-", y2[[1]]),
date_y = as.Date(new_y)
)
#> # Source: spark<?> [?? x 5]
#> x y y2 new_y date_y
#> <dbl> <chr> <list> <chr> <date>
#> 1 1 1/1/2021 <list [3]> 2021-1-1 2021-01-01
Created on 2022-03-24 by the reprex package (v2.0.1)
Hope this helps