bind_rows - R session was abnormally terminated due to an unexpected crash

FelipeA · September 3, 2022, 3:27pm

Hello,
I'm finishing the Coursera Google Data Analytics Professional Certificate course and I'm stuck with one problem on the case study.

The case study is related to historical bicycle trips in Chicago. We use these 4 files that can be download here: Bucket loading...
[Divvy_Trips_2020_Q1.zip]
[Divvy_Trips_2019_Q4.zip]
[Divvy_Trips_2019_Q3.zip]
[Divvy_Trips_2019_Q2.zip]

On the case study there is this script from Kevin Hartman to Clean and prepare the data: Divvy Exercise R Script - Google Docs

I used the script above including few lines to be sure that all datasets has the same number of columns with the same name.

library(tidyverse)  #helps wrangle data
library(lubridate)  #helps wrangle date attributes
library(ggplot2)  #helps visualize data
library(reprex)

#=====================
# STEP 1: COLLECT DATA
#=====================
# Upload Divvy datasets (csv files) here
q2_2019 <- read_csv("Divvy_Trips_2019_Q2.csv")
q3_2019 <- read_csv("Divvy_Trips_2019_Q3.csv")
q4_2019 <- read_csv("Divvy_Trips_2019_Q4.csv")
q1_2020 <- read_csv("Divvy_Trips_2020_Q1.csv")

#====================================================
# STEP 2: WRANGLE DATA AND COMBINE INTO A SINGLE FILE
#====================================================
# Compare column names each of the files
# While the names don't have to be in the same order, they DO need to match perfectly before we can use a command to join them into one file
colnames(q3_2019)
colnames(q4_2019)
colnames(q2_2019)
colnames(q1_2020)

# Rename columns  to make them consistent with q1_2020 (as this will be the supposed going-forward table design for Divvy)

(q4_2019 <- rename(q4_2019
                   ,ride_id = trip_id
                   ,rideable_type = bikeid 
                   ,started_at = start_time  
                   ,ended_at = end_time  
                   ,start_station_name = from_station_name 
                   ,start_station_id = from_station_id 
                   ,end_station_name = to_station_name 
                   ,end_station_id = to_station_id 
                   ,member_casual = usertype))

(q3_2019 <- rename(q3_2019
                   ,ride_id = trip_id
                   ,rideable_type = bikeid 
                   ,started_at = start_time  
                   ,ended_at = end_time  
                   ,start_station_name = from_station_name 
                   ,start_station_id = from_station_id 
                   ,end_station_name = to_station_name 
                   ,end_station_id = to_station_id 
                   ,member_casual = usertype))

(q2_2019 <- rename(q2_2019
                   ,ride_id = "01 - Rental Details Rental ID"
                   ,rideable_type = "01 - Rental Details Bike ID" 
                   ,started_at = "01 - Rental Details Local Start Time"  
                   ,ended_at = "01 - Rental Details Local End Time"  
                   ,start_station_name = "03 - Rental Start Station Name" 
                   ,start_station_id = "03 - Rental Start Station ID"
                   ,end_station_name = "02 - Rental End Station Name" 
                   ,end_station_id = "02 - Rental End Station ID"
                   ,member_casual = "User Type"))

# Inspect the dataframes and look for incongruencies
str(q1_2020)
str(q4_2019)
str(q3_2019)
str(q2_2019)


# Convert ride_id and rideable_type to character so that they can stack correctly
q4_2019 <-  mutate(q4_2019, ride_id = as.character(ride_id)
                   ,rideable_type = as.character(rideable_type)) 
q3_2019 <-  mutate(q3_2019, ride_id = as.character(ride_id)
                   ,rideable_type = as.character(rideable_type)) 
q2_2019 <-  mutate(q2_2019, ride_id = as.character(ride_id)
                   ,rideable_type = as.character(rideable_type)) 

#drop the columns that are not useful  
q1_2020$start_lat <- NULL 
q1_2020$start_lng  <- NULL
q1_2020$end_lat  <- NULL
q1_2020$end_lng  <- NULL


q4_2019$gender <- NULL
q4_2019$birthyear <- NULL


q3_2019$gender <- NULL
q3_2019$birthyear <- NULL

q2_2019$`05 - Member Details Member Birthday Year` <- NULL 
q2_2019$`Member Gender`<- NULL 

#rename column to has the same name as in other datasets
(q2_2019 <- rename(q2_2019
                   ,tripduration = "01 - Rental Details Duration In Seconds Uncapped"))






# Stack individual quarter's data frames into one big data frame
# not included q1_2020 because has one column less
all_trips <- bind_rows(q2_2019, q3_2019, q4_2019)

title: tweed-barb_reprex.R
author: r1387388
date: '2022-09-03'

^{Created on r Sys.Date() with [reprex vr utils::packageVersion("reprex")]}

I tried to run many times but never works. I always get the error: "The previous R session was abnormally terminated due to an unexpected crash".

The files are imported and I create the datasets correctly. The issue is when I try to bind_rows. I tried also rbind and didn't work.
I also tried using the desktop software instead of RStudio Cloud and didn't work.

andresrcs · September 3, 2022, 7:56pm

I think you are just running out of RAM memory, the free tier on RStudio Cloud has a 1GB RAM limit and you are loading a lot of files along with a lot of packages (since you are loading the full tidyverse)

I guess learning how to deal with this kind of situation is part of the intent of this exercise.

FelipeA · September 4, 2022, 6:22pm

Updated: Actually worked using the desktop software, I believe that I just needed to update the columns to match.
Thank you

andresrcs · September 4, 2022, 6:42pm

The desktop version is not limited to 1GB of RAM it can use all the available memory in your system. So most likely you were running out of RAM memory

system · September 25, 2022, 6:42pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.