Converting dates (Year - Month - Day) to 3 separate columns (Year , Month , Day)

tidyr
readxl
tidyverse

#1

Hello, I have a large data set with over thousands of entries and thousands of dates. There are multiple columns. One of the columns is called "date", and it has thousands of entries with the dates in the format, "1996 - 01- 01", etc. I am trying to separate the date column into three new columns, (Year, Month, Day). How do I do this using tidyverse commands?


#2

Take a look at lubridate package, specifically lubridate::year, lubridate::month, and lubridate::day.

You can use them along with dplyr::mutate. Something like this:

x %>%
  dplyr::mutate(year = lubridate::year(date), 
                month = lubridate::month(date), 
                day = lubridate::day(date))

#3

Thanks!

But is there a way to do this with dplyr commands or tidyr commands?


#4

tidyr and dplyr don't have date manipulation functions. lubridate is the tidyverse package for date functions (although it's not one of the "core" tidyverse packages loaded by the tidyverse library).

Based on the information you provided, it looks like your dates might not be in R Date format. The example below is converts the date column to Date format and then uses mutate_at to create separate columns for year, month, and day. This is similar to mishabalyasin's answer, but mutate_at allows all three functions to be applied to the date column with a single line of code.

library(tidyverse)
library(lubridate)

# Fake data
x = data.frame(date=c("2018 - 01 - 04", "2018 - 02 - 16"))

x = x %>% 
  mutate(date = ymd(date)) %>% 
  mutate_at(vars(date), funs(year, month, day))

x
        date year month day
1 2018-01-04 2018     1   4
2 2018-02-16 2018     2  16

#5

Here's a non-lubridate solution. Though I'd probably still use lubridate because it seems to be more robust to variations, for instance if a date accidentally is encoded "19960101"... I think lubridate::ymd() would still parse it.

library(tidyr)

df <- data.frame(date = "1996 - 01- 01", stringsAsFactors = FALSE)

df %>%
  separate(date, sep="-", into = c("year", "month", "day"))

#6

Ahh this is what I was looking for!! Thanks!! I just have one more question. Where it says " data.frame(date=c(" 2018 - 01- 04")), instead of using date = c(), could I use date = [1:500] for a long list of data? Otherwise it would take forever to input all of my dates from my data set


#7

Actually I figured it out!!


#8

If your question's been answered, would you mind choosing a solution? (see FAQ below for how) It makes it a bit easier to visually navigate the site and see which questions still need help.

Thanks