Hello, I have a large data set with over thousands of entries and thousands of dates. There are multiple columns. One of the columns is called "date", and it has thousands of entries with the dates in the format, "1996 - 01- 01", etc. I am trying to separate the date column into three new columns, (Year, Month, Day). How do I do this using tidyverse commands?
Take a look at
lubridate package, specifically
You can use them along with
dplyr::mutate. Something like this:
x %>% dplyr::mutate(year = lubridate::year(date), month = lubridate::month(date), day = lubridate::day(date))
But is there a way to do this with dplyr commands or tidyr commands?
dplyr don't have date manipulation functions.
lubridate is the
tidyverse package for date functions (although it's not one of the "core"
tidyverse packages loaded by the
Based on the information you provided, it looks like your dates might not be in R Date format. The example below is converts the
date column to Date format and then uses
mutate_at to create separate columns for year, month, and day. This is similar to mishabalyasin's answer, but
mutate_at allows all three functions to be applied to the
date column with a single line of code.
library(tidyverse) library(lubridate) # Fake data x = data.frame(date=c("2018 - 01 - 04", "2018 - 02 - 16")) x = x %>% mutate(date = ymd(date)) %>% mutate_at(vars(date), funs(year, month, day)) x date year month day 1 2018-01-04 2018 1 4 2 2018-02-16 2018 2 16
Here's a non-lubridate solution. Though I'd probably still use lubridate because it seems to be more robust to variations, for instance if a date accidentally is encoded "19960101"... I think lubridate::ymd() would still parse it.
library(tidyr) df <- data.frame(date = "1996 - 01- 01", stringsAsFactors = FALSE) df %>% separate(date, sep="-", into = c("year", "month", "day"))
Ahh this is what I was looking for!! Thanks!! I just have one more question. Where it says " data.frame(date=c(" 2018 - 01- 04")), instead of using date = c(), could I use date = [1:500] for a long list of data? Otherwise it would take forever to input all of my dates from my data set
Actually I figured it out!!
If your question's been answered, would you mind choosing a solution? (see FAQ below for how) It makes it a bit easier to visually navigate the site and see which questions still need help.
2 posts were split to a new topic: help with "Error in as.POSIXlt.character(x, tz = tz(x)) : character string is not in a standard unambiguous format."