Switch Dates from dmy to mdy without lubridate

Hey there,

regarding my last question, I need to switch the dmy formate to mdy, but lucridate doesn´t recognize somehow many cells as a date and as a result, many NA´s appear.

So I decided to do it with stringr. That also works fine, but I am not able to apply the calculation for each row respectively. So the values are right, but there are all the same for each column.

Does anyone how to fix this?

Thank you :slight_smile:

library(dplyr)
library(stringr)

data <- tibble(
  name = c("Josh", "Jasi", "Sophie", "Leni"),
  b.date = c("1.17.1990", "24.09.1865", "03.12.2000", "10.04.2000"),
  a.date = c("4.13.1990", "02.03.1865", "03.04.2000", "11.04.2000"))
  
# change alle dates from dmy to mdy

DateConvertmdy <- function(V) {
  pieces <-  str_split(V, "\\.")
  
  day <- pieces[[1]][1]
  
  month <- pieces[[1]][2]
  
  year <- pieces[[1]][3]
  
  str_glue("{month}/{day}/{year}")
  
}


data <- data |>  mutate(across(.cols = contains("dat"), DateConvertmdy))


# transform NA/NA/NA into NA

data[data == "NA/NA/NA"] = NA

but lucridate doesn´t recognize somehow many cells as a date and as a result, many NA´s appear.
So I decided to do it with stringr.

But this is not a lubridate problem that can be approach by using another toolset, its a fundamental data problem because of poor data ...

I'll explain the issue to you explicitly:
you say that you want to transform from day/month/year to month/day/year, but your examples are contain clearly impossible dates if they are consistently interpreted as dmy (or mdy !)

consider only the first two entries of b.date are either
a) the 1st of the 17th month of 1990 , and 24th of september 1865
or
b) 17th Jan 1990, and 9th of the 24th month of 1865.

These are inconsistent and therefore applying a consistent interpretation on both will inevitably lead to one of them being NA

1 Like

Hi @oli.m
Like @nirgrahamuk mentioned, the lack of consistency in date format cannot be solved entirely with stringr, and I guess you will still need a consistent date format for your analytics.

I suggest a solution below, with 3 basic scenarios:

  • do you have at least one clear dmy format? --> meaning first number in the date is an integer > 12
  • do you have at least one clear mdy format? --> meaning second number in the date is an integer > 12
  • is it undefined? --> other cases, when each of two 1st numbers in date <= 12, so we don't know which format, I put the mdy format here but you can decide.

My options below don't cover every possible case and some possible wrong entries.
I did the pattern detection row by row pasting all dates, thinking that the person entering b.date for Jasi will use the same format as for a.date.

So these are some "naive" assumptions about your data but when you apply it on your whole dataset, you can take a look at dates that were not parsed right and adapt the code accordingly.

library(tidyverse); library(lubridate)

data <- tibble(
  name = c("Josh", "Jasi", "Sophie", "Leni"),
  b.date = c("1.17.1990", "24.09.1865", "03.12.2000", "10.04.2000"),
  a.date = c("4.13.1990", "02.03.1865", "03.04.2000", "11.04.2000"))


data %>% 
  unite("pasted", grep("dat", names(data), value = TRUE), sep = " ", remove = FALSE) %>% 
  mutate(across(contains("dat"), 
                ~case_when(
                  str_detect(pasted, "(^|\\s+)(1[3-9]|[23][0-9])\\.") ~ dmy(.),
                  str_detect(pasted, "\\.(1[3-9]|[23][0-9])\\.") ~ mdy(.),
                  TRUE ~ mdy(.)))) %>% select(-pasted) %>% 
  suppressWarnings()
1 Like

And last thing, you mentioned that you wanted the mdy format.

Just note that a date object is shown as YYYY-MM-DD (ISO 8601 - Wikipedia). There are two ways of changing it that I know about: change your locale, or convert the way it prints like below, just be aware that it coerces the type to characters.

#parse dates with two format conditions:
data2 <- data %>% 
  unite("pasted", grep("dat", names(data), value = TRUE), sep = " ", remove = FALSE) %>% 
  mutate(across(contains("dat"), 
                ~case_when(
                  str_detect(pasted, "(^|\\s+)(1[3-9]|[23][0-9])\\.") ~ dmy(.),
                  str_detect(pasted, "\\.(1[3-9]|[23][0-9])\\.") ~ mdy(.),
                  TRUE ~ mdy(.)))) %>% select(-pasted) %>% 
  suppressWarnings()

#convert yyyy/mm/dd date format to mm/dd/yyyy character format: 
data2 %>% mutate(across(contains("dat"), ~format(., "%m/%d/%Y")))

1 Like

wow, thank you a lot! The regex covers almost everything in my case, nice idea.

I didnt know why its always displayed after ISO 8601 - now I do :slight_smile:

I totally know that it is not lubridates problem.
Thank you for help @nirgrahamuk

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.