changing column from character to date format and datetime to date

Hello! I am very novice and trying to learn R. Working on the Google Analytics cert with the Fitbit data. I am wanting to convert the date info in activity and sleep data. I have spent time reading a number of articles and attempting different things- all are yielding errors.

Here is what I have loaded:

#Install packages and load libraries
install.packages("tidyverse")
install.packages("janitor")
install.packages("lubridate")

library(tidyverse)
library(lubridate)
library(janitor)

#Load the data we plan to use and create objects for it
activity <- read.csv("dailyActivity_merged.csv")
sleep <- read.csv("sleepDay_merged.csv")
weight <- read.csv("weightLogInfo_merged.csv")

#I like the format of the ActivityDate column but its in character and I want it to be Date format but have been unsuccessful at changing it. I have tried using mutate and as.Date. I am at a loss on how to change it would appreciate any guidance.

activity2 <- select (activity,-c(TrackerDistance, LoggedActivitiesDistance))

activity2 %>%
mutate(activity2, ActivityDate= Date(mdy)/ weekdays(Date, abbreviate = FALSE)) %>%
rename(date = ActivityDate)

This is the errors:
as.Date(ActivityDate, format)
Error in as.Date(ActivityDate, format) : object 'ActivityDate' not found

as.Date(activity2, format)
Error in as.Date.default(activity2, format) :
do not know how to convert 'activity2' to class “Date”
activity2 <- as.Date(c(ActivityDate))
Error in as.Date(c(ActivityDate)) : object 'ActivityDate' not found

mutate(activity2, ActivityDate= Date(mdy)/ weekdays(Date, abbreviate = FALSE)) %>%

  • rename(date = ActivityDate)
    Error in mutate():
    ! Problem while computing ActivityDate = Date(mdy)/weekdays(Date, abbreviate = FALSE).
    Caused by error in rep.int():
    ! invalid type (closure) for 'times' (must be a vector)
    Run rlang::last_error() to see where the error occurred.

#I need help changing the SleepDay column. It is in a date time format and I just want the Date in mdy format without time.
sleep %>%
separate (col= SleepDay, c("date", "sleep_time"), sep="")
mutate(date = mdy(date), week_day = weekdays(SleepDate)) %>%
select(-"sleep_time")

This is the error:
separate (col= SleepDay, c("date", "sleep_time"), sep="")
Error in UseMethod("separate") :
no applicable method for 'separate' applied to an object of class "character"

mutate(date = mdy(date), week_day = weekdays(SleepDate)) %>%

  • select(-"sleep_time")
    Error in as.character(x) :
    cannot coerce type 'closure' to vector of type 'character'

In addition I want to be able to add a weekdays column from the date and I have looked at how someone else was able to get it to work but when I copied the format I got errors. I dont know what Im doing wrong. I tried using the weekdays function and just get errors, If anyone can help guide me with how to chage the date column info in addition to adding a column for weekdays that would be amazing!!

Thank you

Here are some examples of the kinds of transformations you want to do. Note that you have to tell as.Date() what format the characters are in, not the format you want in the end.

library(dplyr)
DF <- data.frame(Date = c("7/4/2021", "1/21/2020"))
str(DF)
#> 'data.frame':    2 obs. of  1 variable:
#>  $ Date: chr  "7/4/2021" "1/21/2020"
DF <- DF |> mutate(Date = as.Date(Date, format = "%m/%d/%Y"))
str(DF)
#> 'data.frame':    2 obs. of  1 variable:
#>  $ Date: Date, format: "2021-07-04" "2020-01-21"
DF <- DF |> mutate(Weekday = weekdays(Date))
str(DF)
#> 'data.frame':    2 obs. of  2 variables:
#>  $ Date   : Date, format: "2021-07-04" "2020-01-21"
#>  $ Weekday: chr  "Sunday" "Tuesday"


DF2 <- data.frame(DateTime = as.POSIXct(c("2021-05-21 13:45:13",
                               "2021-05-23 09:15:31")))
str(DF2)
#> 'data.frame':    2 obs. of  1 variable:
#>  $ DateTime: POSIXct, format: "2021-05-21 13:45:13" "2021-05-23 09:15:31"
DF2 <- DF2 |> mutate(Date = as.Date(DateTime))
DF2
#>              DateTime       Date
#> 1 2021-05-21 13:45:13 2021-05-21
#> 2 2021-05-23 09:15:31 2021-05-23

Created on 2022-05-14 by the reprex package (v2.0.1)

Hi! Thanks for responding. I feel like Im getting closer with the code that you provided but when I ran it, it changed the format from mdy using char to y-m-d char and thats not what I wanted. In addition it is changing the format from char to NA. What amI doing wrong here?

activity2 <- data.frame(Date = c("2020-04-12"))
str(activity2)
activity2 <- activity2 |> mutate(Date = as.Date(Date, format = "%y/%m/%d"))
str(activity2)

Here is print out from console:
|> view(activity2) > activity2 <- data.frame(Date = c("2020-04-12")) > str(activity2) 'data.frame': 1 obs. of 1 variable: Date: chr "2020-04-12" > activity2 <- activity2 |> mutate(Date = as.Date(Date, format = "%m/%d/%y")) > str(activity2) 'data.frame': 1 obs. of 1 variable: Date: Date, format: NA > activity2 <- activity2 |> mutate(Date = as.Date(Date, format = "%y/%m/%d")) > str(activity2) 'data.frame': 1 obs. of 1 variable: $ Date: Date, format: NA|

The colume name is ActivityDate in the dataframe. Am I supposed to be putting that in where the word Date is for the code? But now I need help putting it back to m-d-y as it was and to date instead of character. Im not sure why the code you gave me did that.

That's the format R uses to display dates. Basically, if you have something which is of the class "Date" that's what you will see.

If you want it to appear differently, you will have to store it as a character string. You might need to work with dates for data wrangling and calculations, then covert them to the format you want as a last step.

You can find more detail here: https://r-lang.com/r-date-format/.

You actually have 2 problems. You have %y instead of %Y, so R is looking for a 2 digit year, not the 4 digit year you supplied. (If you scroll down in the link above, you will see all the %Y, %y codes explained.)

You are also getting NA because the string you supply has a format of %y-%m-%d -- notice the dashes, but you are telling R to look for something in %y/%m/%d -- notice the slashes.

This should work fine:
as.Date("2020-04-12", format = "%Y-%m-%d")

I did see that you had the lubricate library loaded, which has a very helpful ymd function (for parsing dates from strings which are in year-month-date format). I also mdy (month-day-year) and similar functions.

It's more flexible and you can just provide it the date-like character string.

library(lubridate)
ymd(c("2020-04-12", "2021/04/13"))

Check out the lubridate cheat sheet.

Lubridate is honestly a lifesaver if you work with a lot of dates in R.

Hope that helps,

Luke

1 Like

Thanks Luke for your answer. I changed the Y and dashes and ran the code and nothing changed. I dont know how to get the dates to be what I want and just feel like it isnt meant to be for me. I read the cheat sheets and have read a number of articles but I need someone to help me actually write the code the way it needs to be otherwise Im just wasting time plugging in random stuff and getting nowhere.

The best thing to do if you are still stuck is to post some of your data so people here can work with the same data. Let's start with your data frame named activity2. Please post the output of

dput(head(activity2))

When you post that output, place a line with three back ticks just before and just after it, like this
```
Output of dput() goes here.
```
The output of the dput() function will be the text of a structure() function. Here is an example from a data frame I happen to have at the moment named DFnew

Output of dput(head(DFnew))

structure(list(Date = c("2020-01-01", "2020-01-01", "2020-02-02", 
"2020-02-02", "2020-01-01", "2020-01-01"), Material = c("nails", 
"nails", "nails", "nails", "hammer", "hammer"), Quantity = c(10L, 
100L, 50L, 70L, 20L, 80L), DailySum = c(210L, 210L, 190L, 190L, 
210L, 210L), Weight = c(0.0476190476190476, 0.476190476190476, 
0.263157894736842, 0.368421052631579, 0.0952380952380952, 0.380952380952381
), ScaledQuantity = c(0.476190476190476, 47.6190476190476, 13.1578947368421, 
25.7894736842105, 1.9047619047619, 30.4761904761905)), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L), groups = structure(list(
    Date = c("2020-01-01", "2020-02-02"), .rows = structure(list(
        c(1L, 2L, 5L, 6L), 3:4), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -2L), .drop = TRUE))

Your output will be different but it will begin with structure() and include the various columns, as you can see in what I posted.

1 Like

Hi Katherine,

Ditto FJCC's message as far as facilitating us to help you edit your actual code.

Sorry if I misunderstood what you need. I originally understood that you wanted a Date, but in mdy format. The problem was that R just doesn't do this though; R displays dates at Y-m-d (e.g. 2022-05-16).

If you want and need Date class data (which you need for things like calculating the differences between days, arranging data from oldest to newest, etc.) you will need to work with the data as Dates.

I would highly recommend doing any wrangling and processing of the data with your dates as a Date data type, then as a last step you can add something to convert the date into a character with the format you need. (If you don't do this, you're likely to see some unexpected behavior because you will see the variable as a Date, but R will not. Trust me it happens and it's a huge and confusing bummer.)

This could look something like:

myData.fmtDate <- myData |>
  mutate(DispDate = format(Date, "%m/%d/%Y")

Ny the way, one easy way to see the class of your various variables is to use a tibble (in short, it's like a dataframe, but it displays in a much more friendly and useful format at the console. One thing it does is show you the data type of the variables, like this:

# A tibble: 1,000 x 2
   Date        coeff
   <date>      <dbl>
 1 1970-05-25  36.8 
 2 1970-05-26   4.52
 3 1970-05-27 142.  
 4 1970-05-28  59.4 
 5 1970-05-29 115.  
 6 1970-05-30  73.4 

I hope that helps.

Thanks,

Luke

Here is the output for Activity. When I logged in it only had 2 observations so I went up to a previous line of code and ran it. Its back to mdy, character format. Id like to be able to add a weekday column next to it.

 dput(head(activity2))
structure(list(Id = c(1503960366, 1503960366, 1503960366, 1503960366, 
1503960366, 1503960366), ActivityDate = c("4/12/2016", "4/13/2016", 
"4/14/2016", "4/15/2016", "4/16/2016", "4/17/2016"), TotalSteps = c(13162L, 
10735L, 10460L, 9762L, 12669L, 9705L), TotalDistance = c(8.5, 
6.96999979019165, 6.73999977111816, 6.28000020980835, 8.15999984741211, 
6.48000001907349), VeryActiveDistance = c(1.87999999523163, 1.57000005245209, 
2.44000005722046, 2.14000010490417, 2.71000003814697, 3.19000005722046
), ModeratelyActiveDistance = c(0.550000011920929, 0.689999997615814, 
0.400000005960464, 1.25999999046326, 0.409999996423721, 0.779999971389771
), LightActiveDistance = c(6.05999994277954, 4.71000003814697, 
3.91000008583069, 2.82999992370605, 5.03999996185303, 2.50999999046326
), SedentaryActiveDistance = c(0, 0, 0, 0, 0, 0), VeryActiveMinutes = c(25L, 
21L, 30L, 29L, 36L, 38L), FairlyActiveMinutes = c(13L, 19L, 11L, 
34L, 10L, 20L), LightlyActiveMinutes = c(328L, 217L, 181L, 209L, 
221L, 164L), SedentaryMinutes = c(728L, 776L, 1218L, 726L, 773L, 
539L), Calories = c(1985L, 1797L, 1776L, 1745L, 1863L, 1728L)), row.names = c(NA, 
6L), class = "data.frame")

I am also needing too have a date column in sleep data frame too. I dont need the time and I will want to merge these two on ID and Date, if possible, with one column for weekday.Here is the output for dput(head(sleep))

> dput(head(sleep))
structure(list(Id = c(1503960366, 1503960366, 1503960366, 1503960366, 
1503960366, 1503960366), SleepDay = c("4/12/2016 12:00:00 AM", 
"4/13/2016 12:00:00 AM", "4/15/2016 12:00:00 AM", "4/16/2016 12:00:00 AM", 
"4/17/2016 12:00:00 AM", "4/19/2016 12:00:00 AM"), TotalSleepRecords = c(1L, 
2L, 1L, 2L, 1L, 1L), TotalMinutesAsleep = c(327L, 384L, 412L, 
340L, 700L, 304L), TotalTimeInBed = c(346L, 407L, 442L, 367L, 
712L, 320L)), row.names = c(NA, 6L), class = "data.frame")

Thanks for all the guidance!

Here is how I would do the transformations on your data. In the first example I put in an alternative to use functions from the lubridate package.

library(dplyr)
library(lubridate)

activity2 <- structure(list(Id = c(1503960366, 1503960366, 1503960366, 1503960366, 
                                   1503960366, 1503960366), 
                            ActivityDate = c("4/12/2016", "4/13/2016", "4/14/2016", 
                                             "4/15/2016", "4/16/2016", "4/17/2016"), 
                            TotalSteps = c(13162L, 10735L, 10460L, 9762L, 12669L, 9705L), 
                            TotalDistance = c(8.5, 6.96999979019165, 6.73999977111816, 
                                              6.28000020980835, 8.15999984741211, 6.48000001907349), 
                            VeryActiveDistance = c(1.87999999523163, 1.57000005245209, 2.44000005722046, 
                                                   2.14000010490417, 2.71000003814697, 3.19000005722046), 
                            ModeratelyActiveDistance = c(0.550000011920929, 0.689999997615814, 
                                                         0.400000005960464, 1.25999999046326, 0.409999996423721, 
                                                         0.779999971389771), 
                            LightActiveDistance = c(6.05999994277954, 4.71000003814697, 3.91000008583069, 
                                                    2.82999992370605, 5.03999996185303, 2.50999999046326), 
                            SedentaryActiveDistance = c(0, 0, 0, 0, 0, 0), 
                            VeryActiveMinutes = c(25L, 21L, 30L, 29L, 36L, 38L), 
                            FairlyActiveMinutes = c(13L, 19L, 11L, 34L, 10L, 20L), 
                            LightlyActiveMinutes = c(328L, 217L, 181L, 209L, 221L, 164L), 
                            SedentaryMinutes = c(728L, 776L, 1218L, 726L, 773L, 539L), 
                            Calories = c(1985L, 1797L, 1776L, 1745L, 1863L, 1728L)), 
                       row.names = c(NA, 6L), class = "data.frame")
class(activity2$ActivityDate)
#> [1] "character"
activity2 <- activity2 |> mutate(ActivityDate = as.Date(ActivityDate, format = "%m/%d/%Y"),
                                 Weekday = weekdays(ActivityDate))
#alternative code with lubridate
#activity2 <- activity2 |> mutate(ActivityDate = mdy(ActivityDate),
#                                 Weekday = wday(ActivityDate, label = TRUE))

class(activity2$ActivityDate)
#> [1] "Date"
print(activity2$Weekday)
#> [1] "Tuesday"   "Wednesday" "Thursday"  "Friday"    "Saturday"  "Sunday"

sleep <- structure(list(Id = c(1503960366, 1503960366, 1503960366, 1503960366, 
                               1503960366, 1503960366), 
                        SleepDay = c("4/12/2016 12:00:00 AM", "4/13/2016 12:00:00 AM", 
                                     "4/15/2016 12:00:00 AM", "4/16/2016 12:00:00 AM", 
                                     "4/17/2016 12:00:00 AM", "4/19/2016 12:00:00 AM"), 
                        TotalSleepRecords = c(1L, 2L, 1L, 2L, 1L, 1L), 
                        TotalMinutesAsleep = c(327L, 384L, 412L, 340L, 700L, 304L), 
                        TotalTimeInBed = c(346L, 407L, 442L, 367L, 712L, 320L)), 
                   row.names = c(NA, 6L), class = "data.frame")
sleep <- sleep |> mutate(SleepDay = as.Date(SleepDay, format = "%m/%d/%Y %H:%M:%S"))
class(sleep$SleepDay)
#> [1] "Date"
print(sleep$SleepDay)
#> [1] "2016-04-12" "2016-04-13" "2016-04-15" "2016-04-16" "2016-04-17"
#> [6] "2016-04-19"

Created on 2022-05-16 by the reprex package (v2.0.1)

1 Like

Thank you so much for you help. That seemed to change the class to date for both data fames and I was able to add the new Weekday column in Activity2. Question though... I attempted to rename ActivityDate and SleepDate to just Date and when I run the code I can see the output in the console but when I click on the dataframe from the global environment it opens to view it and the changes arent there. Is there something I need to be doing to save it? I dont understand why that is.

activity2 %>%
  rename(Date = ActivityDate)

sleep %>%
  rename(Date = SleepDay)

I get this first
sleep %>%

  • rename(Date = SleepDay)
    Id Date TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
    1 1503960366 2016-04-12 1 327 346
    2 1503960366 2016-04-13 2 384 407

Then run
head(sleep, 2) and get this

head(sleep, 2)
          Id   SleepDay TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
1 1503960366 2016-04-12                 1                327            346
2 1503960366 2016-04-13                 2                384            407

Why arent the data frames saving the rename of the column?

The rename function takes in your data frame, changes the column name and then the function returns the modified data frame. You have to store the data frame that is returned. If you want to store it with the same name, use

activity2 <- activity2 %>%
  rename(Date = ActivityDate)

Hi! I wanted to share a big thanks for helping me. I was able to change the column names and move forward. I am hoping you can help me a little more. I am working on my analysis part, using the same dataframe above and I really want to plot something that shows the different levels of activity distance (given the total) based on weekday &/or calories burned. Im wanting to have plots that show higher intensity activity burns more calories or something of that effect and I also want to see the breakdown of the amount of active minutes per day in each category. I did this so far but its not really giving me what I am looking for.

ggplot(dailymerged, aes(x= Weekday, y= c("LightlyActiveDistance"), color= Weekday))+
  geom_col()

ggplot(dailymerged, aes(x= Weekday, y= c("ModeratelyActiveDistance"), color= Weekday))+
  geom_col()

ggplot(dailymerged, aes(x= Weekday, y= c("VeryActiveDistance"), color= Weekday))+
  geom_col()

Howcan I change this to get the effects that Im looking for. I know there has to be a simple way and Im aware of facet_wrap but not sure exactly how to use it for what Im trying to achieve.
Thanks,
Kathryn

Could you please post this most recent question in a new thread? It is entirely different than your original question and having a single question in each thread helps keep the forum organized. Be sure to include the data from earlier in the thread. I would move the question myself but I do not have that authority. Thanks!
I will be working on an answer in the meantime.

I posted a new topic. :slight_smile:

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.