error while converting date/time column from character to date - google data analytics course

_enya · October 25, 2022, 11:51am

Hello everyone. This is my first time posting here. I am new to R. I am currently working on my capstone project from the google data analytics course. I am trying to convert date/time data from character to date format. Some of my dates are in "POSIXct. POSIxt" and some are in character format. I have been using the following code
df$date <- as.Date(df$started_at)
here is an example of the results. my dates change from 10/08/2021 17:15 (this is the character format) to 0010-08-20 (this is after running the code)
I tried using other functions such as strptime(), as.POSIXct() but they give me errors of their own for example the date column has NAs instead of dates or new dates that don't exist appear such as the year 2030 or 2013. My time frame is between 2021 - 2022 so I don't understand where 2030 is coming from. Here is my code from the beginning. Let me know what I am missing

```{r}
library(tidyverse)
library(lubridate)
library(ggplot2)
```

# importing data

```{r}
aug_2021<-read_csv("202108-divvy-tripdata.csv")
sept_2021<-read_csv("202109-divvy-tripdata.csv")
oct_2021<-read_csv("202110-divvy-tripdata.csv")
nov_2021<-read_csv("202111-divvy-tripdata.csv")
dec_2021<-read_csv("202112-divvy-tripdata.csv")
jan_2022<-read_csv("202201-divvy-tripdata.csv")
feb_2022<-read_csv("202202-divvy-tripdata.csv")
mar_2022<-read_csv("202203-divvy-tripdata.csv")
apr_2022<-read_csv("202204-divvy-tripdata.csv")
may_2022<-read_csv("202205-divvy-tripdata.csv")
jun_2022<-read_csv("202206-divvy-tripdata.csv")
jul_2022<-read_csv("202207-divvy-tripdata.csv")
aug_2022<-read_csv("202208-divvy-tripdata.csv")
```

# wrangle and combine data into a single file

```{r}
colnames(aug_2021)
colnames(sept_2021)
colnames(oct_2021)
colnames(nov_2021)
colnames(dec_2021)
colnames(jan_2022)
colnames(feb_2022)
colnames(mar_2022)
colnames(apr_2022)
colnames(may_2022)
colnames(jun_2022)
colnames(jul_2022)
colnames(aug_2022)
```

# inspecting the dataframes and looking for incongurencies

```{r}
str(aug_2021)
str(sept_2021)
str(oct_2021)
str(nov_2021)
str(dec_2021)
str(jan_2022)
str(feb_2022)
str(mar_2022)
str(apr_2022)
str(may_2022)
str(jun_2022)
str(jul_2022)
str(aug_2022)
```

# combining the data into one big data frame

```{r}
all_trips<-rbind(aug_2021,sept_2021,oct_2021,nov_2021,dec_2021,jan_2022,feb_2022,mar_2022,apr_2022,may_2022,jun_2022,jul_2022,aug_2022)
```

# removing unnecessary columns

```{r}
all_trips<-all_trips%>%
  select(-c(start_lat,start_lng,end_lat,end_lng,start_station_id,start_station_name,end_station_name,end_station_id))
```

# step 3 clean up and add data to prepare for analysis

# inspecting the new table that has been created

```{r}
colnames(all_trips)
nrow(all_trips)
dim(all_trips)
head(all_trips)
tail(all_trips)
str(all_trips)
summary(all_trips)
```

#check how many observations fall under each usertype

```{r}
table(all_trips$member_casual)
```

# Add columns that list the date, month, day, and year of each ride
#The default format is yyyy-mm-dd
# This will allow us to aggregate ride data for each month, day, or year. Before completing these operations we could only aggregate at the ride level

```{r}
all_trips$date <- as.Date(all_trips$started_at)
```

FJCC · October 25, 2022, 1:41pm

If you look at the Help for as.Date, you will find the Usage section that begins with

Usage
as.Date(x, ...)
## S3 method for class 'character'
as.Date(x, format, tryFormats = c("%Y-%m-%d", "%Y/%m/%d"),
        optional = FALSE, ...)

You can see that for character input with slashes, the function will guess that the format is %Y/%m/%d. That accounts for the result

> as.Date("10/08/2021 17:15")
[1] "0010-08-20"

where the 10 is treated as the year, the 08 and the month and the first two digits of 2021 as the day. If you give the function the format of the character date, it will return the correct value. If your dates are in day/month/year format,

as.Date("10/08/2021 17:15", format = "%d/%m/%Y %H:%M")
[1] "2021-08-10"

_enya · October 25, 2022, 2:35pm

It worked. Thank you so much!! Been struggling with this since yesterday. You helped me realize where I went wrong. Just realized that I have been reading the formats wrong. I thought %Y/%m/%d directly translated to 2021/08/10 and did not know that it was in reverse. Will work on learning how to read the formats correctly. Asante!

technocrat · October 25, 2022, 5:00pm

Also, you will find the {lubridate} package very helpful for all things date related lubridate.tidyverse.org

_enya · October 25, 2022, 5:21pm

Thank you. Just saw the cheat sheet very helpful.

_enya · October 25, 2022, 7:38pm

I have one more date-related question. The time difference between two columns (start_date and end_date) but I have to convert the columns to date/time format from character format. Tried using ymd_hms() and strptime(). Here are my codes and results: all_trips is my data frame.

This first code gives me dates outside my time frame. My time frame is 2021-2022. It returns 2030,2005,2016 etc

all_trips$start_date <- ymd_hms(all_trips$start_date,tz="UTC")
all_trips$end_date <- ymd_hms(all_trips$end_date,tz="UTC")

The second code using strptime() gives me NAs

all_trips$start_date <- strptime(all_trips$start_date,format="%Y/%m/%dT%H:%M:%S")
all_trips$end_date <- strptime(all_trips$end_date,format="%Y/%m/%dT%H:%M:%S")

The final code. This is what I have been instructed to use to find the time difference in seconds

all_trips$ride_length <- difftime(all_trips$ended_at,all_trips$started_at)

but when I run it, I get this error

Error in as.POSIXlt.character(x, tz, ...) :
character string is not in a standard unambiguous format
What am I missing?
forgot to add, this is my date/time order "19/08/2021 12:41"

technocrat · October 25, 2022, 8:11pm

Try lubridate::dmy_hm (default is ISO standard 2022-10-25 13:09 style)

a <- "19/08/2021 12:41"
(lubridate::dmy_hm(a)) |> str()
#>  POSIXct[1:1], format: "2021-08-19 12:41:00"

_enya · October 25, 2022, 8:20pm

what is |> str() and its purpose? Sorry I am really new to R

technocrat · October 25, 2022, 8:39pm

The |> symbol is a pipe which conveys the output of the previous function to another function, in which case str() and the purpose here is to show that the result is a datetime object, rather than character. It's similar to the %>% symbol that you will see in the tidyverse packages, but has a slightly different syntax.

(And it's really ok to be new; we don't bite.)

_enya · October 25, 2022, 8:50pm

I am having a difficult time understanding the lubridate function you suggested. Is there a way that I can still use strptime() or even as_datetime() to convert my DateTime from the character format to the date time format?

technocrat · October 25, 2022, 9:06pm

Sure. help("strptime") will give you everything needed. But the reason for lubridate is that the strptime syntax is clunky. If you look at the examples for as_datetime you will see that they use the hms syntax (as_datetime redirects to as_date).

The dmy_hm is pretty straightforward. Is the difficulty that you are looking to display in the "19/08/2021 12:41" form? That's a separate problem than being able to do date conversion for purposes of date calculations and should be done on the back end.

_enya · October 25, 2022, 9:13pm

I did it!! I was so close to crying I used the as_datetime(). Turns out I wasn't as detail-oriented as I thought I was.
The code

all_trips$start_date <- as_datetime(all_trips$start_date, format="%d/%m/%Y %H:%M")

I was writing the format part wrong . You are right, strptime() is clunky and lubridate functions are more straightforward. Thanks for walking with me on this issue. Greatly appreciated

system · December 6, 2022, 9:14pm

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.