Spark dataframe date objects

What is the proper way to "mutate" a new date column that converts a Spark dataframe date chr column into a dataframe date column. I see where the as.Date function works sometimes but not other and the other option is the Hive to_date function.

So if I have a chr date column in a Spark dataframe (say) 1/1/2021. How do I convert this to a "Date" field using sparklyr?

Hi, the problem is that to_date() does not have a "format" options, and I believe as.Date() simply wraps to_date() so there is no advantage to using the UDF directly. Because of the lack of being able to pass the format, we need to pass the string date exactly as to_date() expects, which is YYYY-M-D. To do so, we would need to manipulate the string first, by repositioning the date parts, and then pass it through the string-to-date converter. In the example below, I'm assuming that the first date part is month and not day. I'm using the split() UDF to perform an operation similar to strsplit() and thus easily separate the date parts:

library(sparklyr)
sc <- spark_connect("local")

test_df <- data.frame(x = 1, y = "1/1/2021")
test_tbl <- copy_to(sc, test_df, overwrite = TRUE)

test_tbl %>% 
  mutate(
    y2 = split(y, "/"),
    new_y = paste0(y2[[2]], "-", y2[[0]], "-", y2[[1]]),
    date_y = as.Date(new_y)
    )
#> # Source: spark<?> [?? x 5]
#>       x y        y2         new_y    date_y    
#>   <dbl> <chr>    <list>     <chr>    <date>    
#> 1     1 1/1/2021 <list [3]> 2021-1-1 2021-01-01

Created on 2022-03-24 by the reprex package (v2.0.1)

Hope this helps

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.