Lubridate function does not change variable type from character to datetime

Hi all,

This is my first post in R community and is really great to meet everyone here!

I'm currently facing a problem where I could really use some help here! I have read in a csv file where 2 of the datetime columns (started_at & ended_at) are read in as a character variable type. I used lubridate function to convert it into datetime but when I check back on its type, it is still classified as a character variable. Am I missing something here?

I have uploaded the file and R-code is pasted below for your reference.

library(tidyverse)
library(lubridate)
library(ggplot2)
library(magrittr)

Oct2021 <- read_csv("202110-divvy-tripdata.csv")
str(Oct2021)

#Output
spec_tbl_df [631,226 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)

  • $ ride_id : chr [1:631226] "620BC6107255BF4C" "4471C70731AB2E45" "26CA69D43D15EE14" "362947F0437E1514" ...*
  • $ rideable_type : chr [1:631226] "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...*
  • $ started_at : chr [1:631226] "22-10-21 12:46" "21-10-21 9:12" "16-10-21 16:28" "16-10-21 16:17" ...*
  • $ ended_at : chr [1:631226] "22-10-21 12:49" "21-10-21 9:14" "16-10-21 16:36" "16-10-21 16:19" ...*
  • $ start_station_name: chr [1:631226] "Kingsbury St & Kinzie St" NA NA NA ...*
  • $ start_station_id : chr [1:631226] "KA1503000043" NA NA NA ...*
  • $ end_station_name : chr [1:631226] NA NA NA NA ...*
  • $ end_station_id : chr [1:631226] NA NA NA NA ...*
  • $ start_lat : num [1:631226] 41.9 41.9 41.9 41.9 41.9 ...*
  • $ start_lng : num [1:631226] -87.6 -87.7 -87.7 -87.7 -87.7 ...*
  • $ end_lat : num [1:631226] 41.9 41.9 41.9 41.9 41.9 ...*
  • $ end_lng : num [1:631226] -87.6 -87.7 -87.7 -87.7 -87.7 ...*
  • $ member_casual : chr [1:631226] "member" "member" "member" "member" ...*
    • attr(, "spec")=
  • .. cols(*
  • .. ride_id = col_character(),*
  • .. rideable_type = col_character(),*
  • .. started_at = col_character(),*
  • .. ended_at = col_character(),*
  • .. start_station_name = col_character(),*
  • .. start_station_id = col_character(),*
  • .. end_station_name = col_character(),*
  • .. end_station_id = col_character(),*
  • .. start_lat = col_double(),*
  • .. start_lng = col_double(),*
  • .. end_lat = col_double(),*
  • .. end_lng = col_double(),*
  • .. member_casual = col_character()*
  • .. )*
    • attr(, "problems")=

Oct2021 %>%
mutate(started_at = lubridate::dmy_hm(started_at), ended_at = lubridate::dmy_hm(ended_at))

#Output
# A tibble: 631,226 x 13

  • ride_id rideable_type started_at ended_at start_station_na~ start_station_id end_station_name*
  • *
  • 1 620BC6107~ electric_bike 2021-10-22 12:46:00 2021-10-22 12:49:00 Kingsbury St & K~ KA1503000043 NA *
  • 2 4471C7073~ electric_bike 2021-10-21 09:12:00 2021-10-21 09:14:00 NA NA NA *
  • 3 26CA69D43~ electric_bike 2021-10-16 16:28:00 2021-10-16 16:36:00 NA NA NA *
  • 4 362947F04~ electric_bike 2021-10-16 16:17:00 2021-10-16 16:19:00 NA NA NA *
  • 5 BB731DE2F~ electric_bike 2021-10-20 23:17:00 2021-10-20 23:26:00 NA NA NA *
  • 6 7176307BB~ electric_bike 2021-10-21 16:57:00 2021-10-21 17:11:00 NA NA NA *
  • 7 E965A0415~ electric_bike 2021-10-21 17:46:00 2021-10-21 17:49:00 NA NA NA *
  • 8 E41D986E8~ electric_bike 2021-10-20 23:30:00 2021-10-20 23:38:00 NA NA NA *
  • 9 E189D96E3~ electric_bike 2021-10-21 18:17:00 2021-10-21 18:24:00 NA NA NA *
    *10 17019B8A4~ electric_bike 2021-10-06 18:47:00 2021-10-06 18:56:00 NA NA NA *
    # ... with 631,216 more rows, and 6 more variables: end_station_id , start_lat , start_lng ,
    # end_lat , end_lng , member_casual

Double-check on variable type again

str(Oct2021)

#Output
spec_tbl_df [631,226 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)

  • $ ride_id : chr [1:631226] "620BC6107255BF4C" "4471C70731AB2E45" "26CA69D43D15EE14" "362947F0437E1514" ...*
  • $ rideable_type : chr [1:631226] "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...*
  • $ started_at : chr [1:631226] "22-10-21 12:46" "21-10-21 9:12" "16-10-21 16:28" "16-10-21 16:17" ...*
  • $ ended_at : chr [1:631226] "22-10-21 12:49" "21-10-21 9:14" "16-10-21 16:36" "16-10-21 16:19" ...*
  • $ start_station_name: chr [1:631226] "Kingsbury St & Kinzie St" NA NA NA ...*
  • $ start_station_id : chr [1:631226] "KA1503000043" NA NA NA ...*
  • $ end_station_name : chr [1:631226] NA NA NA NA ...*
  • $ end_station_id : chr [1:631226] NA NA NA NA ...*
  • $ start_lat : num [1:631226] 41.9 41.9 41.9 41.9 41.9 ...*
  • $ start_lng : num [1:631226] -87.6 -87.7 -87.7 -87.7 -87.7 ...*
  • $ end_lat : num [1:631226] 41.9 41.9 41.9 41.9 41.9 ...*
  • $ end_lng : num [1:631226] -87.6 -87.7 -87.7 -87.7 -87.7 ...*
  • $ member_casual : chr [1:631226] "member" "member" "member" "member" ...*
    • attr(, "spec")=
  • .. cols(*
  • .. ride_id = col_character(),*
  • .. rideable_type = col_character(),*
  • .. started_at = col_character(),*
  • .. ended_at = col_character(),*
  • .. start_station_name = col_character(),*
  • .. start_station_id = col_character(),*
  • .. end_station_name = col_character(),*
  • .. end_station_id = col_character(),*
  • .. start_lat = col_double(),*
  • .. start_lng = col_double(),*
  • .. end_lat = col_double(),*
  • .. end_lng = col_double(),*
  • .. member_casual = col_character()*
  • .. )*
    • attr(, "problems")=

Let me know if you need anything else from me or any part is not explained clearly, happy to elaborate! Any form of help here is appreciated!

Thanks!

Could your issue simply be that you haven't re-assigned your mutate step to your global environment?

> library(tidyverse)
> 
> df = tibble(x = "22-10-21 12:46")
> 
> glimpse(df)
Rows: 1
Columns: 1
$ x <chr> "22-10-21 12:46"
> 
> df = df |> mutate(x = lubridate::dmy_hm(x))
> 
> glimpse(df)
Rows: 1
Columns: 1
$ x <dttm> 2021-10-22 12:46:00

e.g., your code should read:

Oct2021 <- Oct2021 %>%
    mutate(started_at = lubridate::dmy_hm(started_at), 
           ended_at = lubridate::dmy_hm(ended_at))

Hi JackDavison,

Yes you are right, after re-assigning the mutate step to the global environment, I re-run the structure code to observe but found something strange.

> str(Oct2021)
spec_tbl_df [631,226 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ ride_id           : chr [1:631226] "620BC6107255BF4C" "4471C70731AB2E45" "26CA69D43D15EE14" "362947F0437E1514" ...
 $ rideable_type     : chr [1:631226] "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
 $ started_at        : POSIXct[1:631226], format: "2021-10-22 12:46:00" "2021-10-21 09:12:00" "2021-10-16 16:28:00" "2021-10-16 16:17:00" ...
 $ ended_at          : POSIXct[1:631226], format: "2021-10-22 12:49:00" "2021-10-21 09:14:00" "2021-10-16 16:36:00" "2021-10-16 16:19:00" ...
 $ start_station_name: chr [1:631226] "Kingsbury St & Kinzie St" NA NA NA ...
 $ start_station_id  : chr [1:631226] "KA1503000043" NA NA NA ...
 $ end_station_name  : chr [1:631226] NA NA NA NA ...
 $ end_station_id    : chr [1:631226] NA NA NA NA ...
 $ start_lat         : num [1:631226] 41.9 41.9 41.9 41.9 41.9 ...
 $ start_lng         : num [1:631226] -87.6 -87.7 -87.7 -87.7 -87.7 ...
 $ end_lat           : num [1:631226] 41.9 41.9 41.9 41.9 41.9 ...
 $ end_lng           : num [1:631226] -87.6 -87.7 -87.7 -87.7 -87.7 ...
 $ member_casual     : chr [1:631226] "member" "member" "member" "member" ...
 - attr(*, "spec")=
  .. cols(
  ..   ride_id = col_character(),
  ..   rideable_type = col_character(),
  ..   started_at = col_character(),
  ..   ended_at = col_character(),
  ..   start_station_name = col_character(),
  ..   start_station_id = col_character(),
  ..   end_station_name = col_character(),
  ..   end_station_id = col_character(),
  ..   start_lat = col_double(),
  ..   start_lng = col_double(),
  ..   end_lat = col_double(),
  ..   end_lng = col_double(),
  ..   member_casual = col_character()
  .. )
 - attr(*, "problems")=<externalptr> 

As you can see, under the spec_table_df, 'started_at' and 'ended_at' are now both POSIXct type. However, in the attr(*, "spec")= section, these 2 variables are still stated as col_character(). Any idea why this is so?

Thanks!

The spec comes from the read process that created the character versions. when you mutated you changed them away from the type that they were then. Your mutate has no effect on the data.frame's attributes which remain. Are they needed for any purpose ? usually they are just needed to understand the read in process, i.e. ignore it.

Hi nirgrahamuk,

Thank you so much for the explanation! I'm much clearer on this now and will not let that message bother me too much!

Cheers!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.