How to swich strings with condition?

I have quite a big and messy dataset, where all the dates have different formats:

either 01.03.2022 or 03/01/2022

I would like to transform them to the same format.

So far I managed to change all "/" into a ".", but now the month and the day are in the wrong order.

Does anyone know how to switch strings seperated by the first slash, but not by the first point?

Furthermore it need to applied to the whole dataframe, because all date variables have the same problem.

Thank you so much.


## create a tibble

data <- tibble(
  name = c("Josh", "Jasi", "Sophie", "Leni"),
  date = c("1/17/1990", "24.09.1865", "3/13/2000", "14.04.2000")
)

## change / to a . in the date

data <- data.frame(lapply(data, function(x) {
  gsub("/", ".", x)
}))


## print new data

print(data)


The code throws warnings because of the way case_when evaluates the test conditions but the result is correct.

library(tibble)
#> Warning: package 'tibble' was built under R version 4.1.2
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(stringr)
data <- tibble(
  name = c("Josh", "Jasi", "Sophie", "Leni"),
  date = c("1/17/1990", "24.09.1865", "3/13/2000", "14.04.2000")
)

data <- data |> mutate(date = case_when(
  str_detect(date, "/") ~ mdy(date),
  TRUE ~ dmy(date)
))
#> Warning: 2 failed to parse.
#> Warning: 2 failed to parse.
data
#> # A tibble: 4 x 2
#>   name   date      
#>   <chr>  <date>    
#> 1 Josh   1990-01-17
#> 2 Jasi   1865-09-24
#> 3 Sophie 2000-03-13
#> 4 Leni   2000-04-14

Created on 2022-03-01 by the reprex package (v2.0.1)

But unfortunately I cannot apply it on the whole dateframe.

Could that work with a for loop like that:

## packages

library(dplyr)
library(magrittr)
library(stringr)
library(lubridate)

## create a tibble
data <- tibble(
  name = c("Josh", "Jasi", "Sophie", "Leni"),
  date = c("1/17/1990", "24.09.1865", "3/13/2000", "14.04.2000"),
  date2 = c("1/17/1990", "24.09.1865", "3/13/2000", "14.04.2000")
)

# get colnames containing dates

date_columns <- grep("date", names(data), value = TRUE)

for(i in date_columns){

data <- data |> mutate(i = case_when(
  str_detect(i, "/") ~ mdy(i),
  TRUE ~ dmy(i)))

}

    

because all columns in my dataset containing "dat", but I know its hard to loop over strings.

Does anyone know a solution?

Thats cool, thank you!

If the date columns follow a naming convention, e.g. the column name always contains the text "date", you can use the across() function within mutate() to affect those columns. I wrote a function to handle the date conversion so that the call to mutate(across()) would be cleaner.

library(dplyr)
library(tibble)
library(stringr)
library(lubridate)
data <- tibble(
   name = c("Josh", "Jasi", "Sophie", "Leni"),
   date = c("1/17/1990", "24.09.1865", "3/13/2000", "14.04.2000"),
   date2 = c("1/17/1990", "24.09.1865", "3/13/2000", "14.04.2000")
 )
DateConvert <- function(V) {
   case_when(
     str_detect(V, "/") ~ mdy(V),
     TRUE ~ dmy(V)
   )
 }
data <- data |>  mutate(across(.cols = contains("date"), .fns = DateConvert))
data
# A tibble: 4 x 3
  name   date       date2     
  <chr>  <date>     <date>    
1 Josh   1990-01-17 1990-01-17
2 Jasi   1865-09-24 1865-09-24
3 Sophie 2000-03-13 2000-03-13
4 Leni   2000-04-14 2000-04-14
1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.