Crashing Message

Hi. My RStudio is crashing when I try to combine data frames into one data frame. I have also received warning messages that I have exceeded my project space and project hours. This has led me to subscribe in order to continue learning RStudio. After getting the RStudio crashing message, I then receive an error that the columns did not combine. How do I solve this error and avoid getting the crashing message?

> all_trips <- bind_rows(q2_2019, q3_2019, q4_2019, q1_2020)
Error in `bind_rows()`:
! Can't combine `..1$ride_id` <double> and `..3$ride_id` <character>.
Run `rlang::last_error()` to see where the error occurred.
Session restored from your saved work on 2023-Jan-22 11:18:17 UTC (41 minutes ago)

Welcome to the forum.

It really would help to have more of your code and some sample data , see below. but it looks like you have data in different types. That is what what
! Can't combine ..1$ride_id and ..3$ride_id .
is telling you. ..1$ride_idis numeric and ``..3$ride_id` is character and you cannot do an rbind on different types.

A handy way to supply some sample data is the dput() function. In the case of a large dataset something like dput(head(mydata, 100)) should supply the data we need. Just do dput(mydata) where mydata is your data. Copy the output and paste it here.

When you read a file into memory R guesses the column types by sampling the content of the file but sometimes it guesses wrong, in those cases you can solve the problem by manually specifying the data types for each column of your files, how to do this depends on the actual command you are using to read them so please follow the advice and try to provide a proper reproducible example.

Hi. Thank you for your response. So here is the code that led to the error.

q4_2019 <- mutate(q4_2019, ride_id = as.character(ride_id)
+                   ,rideable_type = as.character(rideable_type)) 
> View(q3_2019)
> View(q2_2019)
> View(q1_2020)
> View(all_trips)
Error in View : object 'all_trips' not found
> all_trips <- bind_rows(q2_2019, q3_2019, q4_2019, q1_2020)
Error in `bind_rows()`:
! Can't combine `..1$ride_id` <double> and `..3$ride_id` <character>.
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/vctrs_error_ptype2>
Error in `bind_rows()`:
! Can't combine `..1$ride_id` <double> and `..3$ride_id` <character>.

I am supposed to rename columns in the data frame q2_2019, q3_2019, q4_2019 to have the same column names as a data frame q1_2020. How do I change the columns manually for all the datasets to look like the columns of q_2020 so that I combine them all into one data frame?

That is not the code that is causing the problem, as I said, the data types are not being correctly assigned when the data is loaded into memory. What function are you using to read the data into memory? From where the data is coming?

We really need a reproducible example. FAQ: How to do a minimal reproducible example ( reprex ) for beginners

We have no idea what data.frames q2_2019 or q1_2020 are.

Okay guys, I am really new to R, so here is the whole project...

* DONE (tidyverse)

The downloaded source packages are in
	β€˜/tmp/Rtmp4zNiXN/downloaded_packages’
> install.packages("lubridate")
Installing package into β€˜/cloud/lib/x86_64-pc-linux-gnu-library/4.2’
(as β€˜lib’ is unspecified)
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/lubridate_1.9.0.tar.gz'
Content type 'application/x-gzip' length 960179 bytes (937 KB)
==================================================
downloaded 937 KB

* installing *binary* package β€˜lubridate’ ...
* DONE (lubridate)

The downloaded source packages are in
	β€˜/tmp/Rtmp4zNiXN/downloaded_packages’
> install.packages("ggplot")
Installing package into β€˜/cloud/lib/x86_64-pc-linux-gnu-library/4.2’
(as β€˜lib’ is unspecified)
Warning in install.packages :
  package β€˜ggplot’ is not available for this version of R

A version of this package for your version of R might be available elsewhere,
see the ideas at
https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages
> library(tidyverse)
── Attaching packages ──────────────────────────────────────────────── tidyverse 1.3.2 ──
βœ” ggplot2 3.4.0      βœ” purrr   1.0.1 
βœ” tibble  3.1.8      βœ” dplyr   1.0.10
βœ” tidyr   1.2.1      βœ” stringr 1.5.0 
βœ” readr   2.1.3      βœ” forcats 0.5.2 
── Conflicts ─────────────────────────────────────────────────── tidyverse_conflicts() ──
βœ– dplyr::filter() masks stats::filter()
βœ– dplyr::lag()    masks stats::lag()
> library(lubridate)
Loading required package: timechange

Attaching package: β€˜lubridate’

The following objects are masked from β€˜package:base’:

    date, intersect, setdiff, union

> library(ggplot2)
> getwd()
[1] "/cloud/project"
> setwd()
Error in setwd() : argument "dir" is missing, with no default
> setwd(Users/estherwairimu/Desktop/Divvy_Exercise)
Error in setwd(Users/estherwairimu/Desktop/Divvy_Exercise) : 
  object 'Users' not found
> setwd(project/estherwairimu/Desktop/Divvy_Exercise)
Error in setwd(project/estherwairimu/Desktop/Divvy_Exercise) : 
  object 'project' not found
> setwd(estherwairimu/Desktop/Divvy_Exercise)
Error in setwd(estherwairimu/Desktop/Divvy_Exercise) : 
  object 'estherwairimu' not found
> setwd(dir/project/CapstoneProject/estherwairimu/Desktop/Divvy_Exercise)
Error in setwd(dir/project/CapstoneProject/estherwairimu/Desktop/Divvy_Exercise) : 
  object 'project' not found
> setwd("R:CapstoneProject")
> setwd(R/CapstoneProject/estherwairimu)
Error in setwd(R/CapstoneProject/estherwairimu) : object 'R' not found
> setwd("/cloud/project/R:CapstoneProject")
> get()
Error in get() : argument "x" is missing, with no default
> getwd()
[1] "/cloud/project/R:CapstoneProject"
> setwd("/cloud/project/R:CapstoneProject")
> q2_2019 <- read_csv("Divvy_Trips_2019_Q2.csv")
Error: 'Divvy_Trips_2019_Q2.csv' does not exist in current working directory ('/cloud/project/R:CapstoneProject').
> q2_2019 <- read_csv("Divvy_Trips_2019_Q2.csv")
Rows: 1108163 Columns: 12                                                                                                    
── Column specification ──────────────────────────────────────────────────────────────────
Delimiter: ","
chr  (4): 03 - Rental Start Station Name, 02 - Rental End Station Name, User Type, Mem...
dbl  (5): 01 - Rental Details Rental ID, 01 - Rental Details Bike ID, 03 - Rental Star...
num  (1): 01 - Rental Details Duration In Seconds Uncapped
dttm (2): 01 - Rental Details Local Start Time, 01 - Rental Details Local End Time

β„Ή Use `spec()` to retrieve the full column specification for this data.
β„Ή Specify the column types or set `show_col_types = FALSE` to quiet this message.
> q3_2019 <- read_csv("Divvy_Trips_2019_Q3.csv")
Rows: 1640718 Columns: 12                                                                                                    
── Column specification ──────────────────────────────────────────────────────────────────
Delimiter: ","
chr  (4): from_station_name, to_station_name, usertype, gender
dbl  (5): trip_id, bikeid, from_station_id, to_station_id, birthyear
num  (1): tripduration
dttm (2): start_time, end_time

β„Ή Use `spec()` to retrieve the full column specification for this data.
β„Ή Specify the column types or set `show_col_types = FALSE` to quiet this message.
> q4_2019 <- read_csv("Divvy_Trips_2019_Q4.csv")
Rows: 704054 Columns: 12                                                                                                     
── Column specification ──────────────────────────────────────────────────────────────────
Delimiter: ","
chr  (4): from_station_name, to_station_name, usertype, gender
dbl  (5): trip_id, bikeid, from_station_id, to_station_id, birthyear
num  (1): tripduration
dttm (2): start_time, end_time

β„Ή Use `spec()` to retrieve the full column specification for this data.
β„Ή Specify the column types or set `show_col_types = FALSE` to quiet this message.
> q1_2020 <- read_csv("Divvy_Trips_2020_Q1.csv")
Rows: 426887 Columns: 13                                                                
── Column specification ──────────────────────────────────────────────────────────────────
Delimiter: ","
chr  (5): ride_id, rideable_type, start_station_name, end_station_name, member_casual
dbl  (6): start_station_id, end_station_id, start_lat, start_lng, end_lat, end_lng
dttm (2): started_at, ended_at

β„Ή Use `spec()` to retrieve the full column specification for this data.
β„Ή Specify the column types or set `show_col_types = FALSE` to quiet this message.
> colnames(q3_2019)
 [1] "trip_id"           "start_time"        "end_time"          "bikeid"           
 [5] "tripduration"      "from_station_id"   "from_station_name" "to_station_id"    
 [9] "to_station_name"   "usertype"          "gender"            "birthyear"        
> colnames(q4_2019)
 [1] "trip_id"           "start_time"        "end_time"          "bikeid"           
 [5] "tripduration"      "from_station_id"   "from_station_name" "to_station_id"    
 [9] "to_station_name"   "usertype"          "gender"            "birthyear"        
> colnames(q2_2019)
 [1] "01 - Rental Details Rental ID"                   
 [2] "01 - Rental Details Local Start Time"            
 [3] "01 - Rental Details Local End Time"              
 [4] "01 - Rental Details Bike ID"                     
 [5] "01 - Rental Details Duration In Seconds Uncapped"
 [6] "03 - Rental Start Station ID"                    
 [7] "03 - Rental Start Station Name"                  
 [8] "02 - Rental End Station ID"                      
 [9] "02 - Rental End Station Name"                    
[10] "User Type"                                       
[11] "Member Gender"                                   
[12] "05 - Member Details Member Birthday Year"        
> colnames(q1_2020)
 [1] "ride_id"            "rideable_type"      "started_at"         "ended_at"          
 [5] "start_station_name" "start_station_id"   "end_station_name"   "end_station_id"    
 [9] "start_lat"          "start_lng"          "end_lat"            "end_lng"           
[13] "member_casual"     
> (q4_2019 <- rename(q4_2019 
+                    ,ride_id = trip_id
+                    ,rideable_type = bikeid
+                    ,started_at = start_time
+                    ,ended_at = end_time
+                    ,start_station_name = from_station_name
+                    ,start_station_id = from_station_id
+                    ,end_station_name = to_station_name
+                    ,end_station_id = to_station_id
+                    ,member_casual = usertype))
# A tibble: 704,054 Γ— 12
   ride_id started_at          ended_at            ridea…¹ tripd…² start…³ start…⁴ end_s…⁡
     <dbl> <dttm>              <dttm>                <dbl>   <dbl>   <dbl> <chr>     <dbl>
 1  2.52e7 2019-10-01 00:01:39 2019-10-01 00:17:20    2215     940      20 Sheffi…     309
 2  2.52e7 2019-10-01 00:02:16 2019-10-01 00:06:34    6328     258      19 Throop…     241
 3  2.52e7 2019-10-01 00:04:32 2019-10-01 00:18:43    3003     850      84 Milwau…     199
 4  2.52e7 2019-10-01 00:04:32 2019-10-01 00:43:43    3275    2350     313 Lakevi…     290
 5  2.52e7 2019-10-01 00:04:34 2019-10-01 00:35:42    5294    1867     210 Ashlan…     382
 6  2.52e7 2019-10-01 00:04:38 2019-10-01 00:10:51    1891     373     156 Clark …     226
 7  2.52e7 2019-10-01 00:04:52 2019-10-01 00:22:45    1061    1072      84 Milwau…     142
 8  2.52e7 2019-10-01 00:04:57 2019-10-01 00:29:16    1274    1458     156 Clark …     463
 9  2.52e7 2019-10-01 00:05:20 2019-10-01 00:29:18    6011    1437     156 Clark …     463
10  2.52e7 2019-10-01 00:05:20 2019-10-01 02:23:46    2957    8306     336 Cottag…     336
# … with 704,044 more rows, 4 more variables: end_station_name <chr>,
#   member_casual <chr>, gender <chr>, birthyear <dbl>, and abbreviated variable names
#   ¹​rideable_type, ²​tripduration, ³​start_station_id, ⁴​start_station_name,
#   ⁡​end_station_id
# β„Ή Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
> (q3_2019 <- rename(q3_2019
+                    ,ride_id = trip_id
+                    ,rideable_type = bikeid
+                    ,started_at = start_time
+                    ,ended_at = end_time
+                    ,start_station_name = from_station_name
+                    ,start_station_id = from_station_id
+                    ,end_station_name = to_station_name
+                    ,end_station_id = to_station_id
+                    ,member_casual = usertype))
# A tibble: 1,640,718 Γ— 12
   ride_id started_at          ended_at            ridea…¹ tripd…² start…³ start…⁴ end_s…⁡
     <dbl> <dttm>              <dttm>                <dbl>   <dbl>   <dbl> <chr>     <dbl>
 1  2.35e7 2019-07-01 00:00:27 2019-07-01 00:20:41    3591    1214     117 Wilton…     497
 2  2.35e7 2019-07-01 00:01:16 2019-07-01 00:18:44    5353    1048     381 Wester…     203
 3  2.35e7 2019-07-01 00:01:48 2019-07-01 00:27:42    6180    1554     313 Lakevi…     144
 4  2.35e7 2019-07-01 00:02:07 2019-07-01 00:27:10    5540    1503     313 Lakevi…     144
 5  2.35e7 2019-07-01 00:02:13 2019-07-01 00:22:26    6014    1213     168 Michig…      62
 6  2.35e7 2019-07-01 00:02:21 2019-07-01 00:07:31    4941     310     300 Broadw…     232
 7  2.35e7 2019-07-01 00:02:24 2019-07-01 00:23:12    3770    1248     168 Michig…      62
 8  2.35e7 2019-07-01 00:02:26 2019-07-01 00:28:16    5442    1550     313 Lakevi…     144
 9  2.35e7 2019-07-01 00:02:34 2019-07-01 00:28:57    2957    1583      43 Michig…     195
10  2.35e7 2019-07-01 00:02:45 2019-07-01 00:29:14    6091    1589      43 Michig…     195
# … with 1,640,708 more rows, 4 more variables: end_station_name <chr>,
#   member_casual <chr>, gender <chr>, birthyear <dbl>, and abbreviated variable names
#   ¹​rideable_type, ²​tripduration, ³​start_station_id, ⁴​start_station_name,
#   ⁡​end_station_id
# β„Ή Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
> (q2_2019 <- rename(q2_2019
+                    ,ride_id = "01 - Rental Details Rental ID"
+                    ,rideable_type = "01 - Rental Details Bike ID"
+                    ,started_at = "01 - Rental Details Local Start Time"
+                    ,ended_at = "01 - Rental Details Local End Time"
+                    ,start_station_name = "03 - Rental Start Station Name"
+                    ,start_station_id = "03 - Rental Start Station ID"
+                    ,end_station_name = "02 - Rental End Station Name"
+                    ,end_station_id = "02 - Rental End Station ID"
+                    ,member_casual = "User Type"))
# A tibble: 1,108,163 Γ— 12
   ride_id started_at          ended_at            ridea…¹ 01 - …² start…³ start…⁴ end_s…⁡
     <dbl> <dttm>              <dttm>                <dbl>   <dbl>   <dbl> <chr>     <dbl>
 1  2.22e7 2019-04-01 00:02:22 2019-04-01 00:09:48    6251     446      81 Daley …      56
 2  2.22e7 2019-04-01 00:03:02 2019-04-01 00:20:30    6226    1048     317 Wood S…      59
 3  2.22e7 2019-04-01 00:11:07 2019-04-01 00:15:19    5649     252     283 LaSall…     174
 4  2.22e7 2019-04-01 00:13:01 2019-04-01 00:18:58    4151     357      26 McClur…     133
 5  2.22e7 2019-04-01 00:19:26 2019-04-01 00:36:13    3270    1007     202 Halste…     129
 6  2.22e7 2019-04-01 00:19:39 2019-04-01 00:23:56    3123     257     420 Ellis …     426
 7  2.22e7 2019-04-01 00:26:33 2019-04-01 00:35:41    6418     548     503 Drake …     500
 8  2.22e7 2019-04-01 00:29:48 2019-04-01 00:36:11    4513     383     260 Kedzie…     499
 9  2.22e7 2019-04-01 00:32:07 2019-04-01 01:07:44    3280    2137     211 St. Cl…     211
10  2.22e7 2019-04-01 00:32:19 2019-04-01 01:07:39    5534    2120     211 St. Cl…     211
# … with 1,108,153 more rows, 4 more variables: end_station_name <chr>,
#   member_casual <chr>, `Member Gender` <chr>,
#   `05 - Member Details Member Birthday Year` <dbl>, and abbreviated variable names
#   ¹​rideable_type, ²​`01 - Rental Details Duration In Seconds Uncapped`,
#   ³​start_station_id, ⁴​start_station_name, ⁡​end_station_id
# β„Ή Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
> str(q1_2020)
spc_tbl_ [426,887 Γ— 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ ride_id           : chr [1:426887] "EACB19130B0CDA4A" "8FED874C809DC021" "789F3C21E472CA96" "C9A388DAC6ABF313" ...
 $ rideable_type     : chr [1:426887] "docked_bike" "docked_bike" "docked_bike" "docked_bike" ...
 $ started_at        : POSIXct[1:426887], format: "2020-01-21 20:06:59" "2020-01-30 14:22:39" ...
 $ ended_at          : POSIXct[1:426887], format: "2020-01-21 20:14:30" "2020-01-30 14:26:22" ...
 $ start_station_name: chr [1:426887] "Western Ave & Leland Ave" "Clark St & Montrose Ave" "Broadway & Belmont Ave" "Clark St & Randolph St" ...
 $ start_station_id  : num [1:426887] 239 234 296 51 66 212 96 96 212 38 ...
 $ end_station_name  : chr [1:426887] "Clark St & Leland Ave" "Southport Ave & Irving Park Rd" "Wilton Ave & Belmont Ave" "Fairbanks Ct & Grand Ave" ...
 $ end_station_id    : num [1:426887] 326 318 117 24 212 96 212 212 96 100 ...
 $ start_lat         : num [1:426887] 42 42 41.9 41.9 41.9 ...
 $ start_lng         : num [1:426887] -87.7 -87.7 -87.6 -87.6 -87.6 ...
 $ end_lat           : num [1:426887] 42 42 41.9 41.9 41.9 ...
 $ end_lng           : num [1:426887] -87.7 -87.7 -87.7 -87.6 -87.6 ...
 $ member_casual     : chr [1:426887] "member" "member" "member" "member" ...
 - attr(*, "spec")=
  .. cols(
  ..   ride_id = col_character(),
  ..   rideable_type = col_character(),
  ..   started_at = col_datetime(format = ""),
  ..   ended_at = col_datetime(format = ""),
  ..   start_station_name = col_character(),
  ..   start_station_id = col_double(),
  ..   end_station_name = col_character(),
  ..   end_station_id = col_double(),
  ..   start_lat = col_double(),
  ..   start_lng = col_double(),
  ..   end_lat = col_double(),
  ..   end_lng = col_double(),
  ..   member_casual = col_character()
  .. )
 - attr(*, "problems")=<externalptr> 
> str(q4_2019)
spc_tbl_ [704,054 Γ— 12] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ ride_id           : num [1:704054] 25223640 25223641 25223642 25223643 25223644 ...
 $ started_at        : POSIXct[1:704054], format: "2019-10-01 00:01:39" "2019-10-01 00:02:16" ...
 $ ended_at          : POSIXct[1:704054], format: "2019-10-01 00:17:20" "2019-10-01 00:06:34" ...
 $ rideable_type     : num [1:704054] 2215 6328 3003 3275 5294 ...
 $ tripduration      : num [1:704054] 940 258 850 2350 1867 ...
 $ start_station_id  : num [1:704054] 20 19 84 313 210 156 84 156 156 336 ...
 $ start_station_name: chr [1:704054] "Sheffield Ave & Kingsbury St" "Throop (Loomis) St & Taylor St" "Milwaukee Ave & Grand Ave" "Lakeview Ave & Fullerton Pkwy" ...
 $ end_station_id    : num [1:704054] 309 241 199 290 382 226 142 463 463 336 ...
 $ end_station_name  : chr [1:704054] "Leavitt St & Armitage Ave" "Morgan St & Polk St" "Wabash Ave & Grand Ave" "Kedzie Ave & Palmer Ct" ...
 $ member_casual     : chr [1:704054] "Subscriber" "Subscriber" "Subscriber" "Subscriber" ...
 $ gender            : chr [1:704054] "Male" "Male" "Female" "Male" ...
 $ birthyear         : num [1:704054] 1987 1998 1991 1990 1987 ...
 - attr(*, "spec")=
  .. cols(
  ..   trip_id = col_double(),
  ..   start_time = col_datetime(format = ""),
  ..   end_time = col_datetime(format = ""),
  ..   bikeid = col_double(),
  ..   tripduration = col_number(),
  ..   from_station_id = col_double(),
  ..   from_station_name = col_character(),
  ..   to_station_id = col_double(),
  ..   to_station_name = col_character(),
  ..   usertype = col_character(),
  ..   gender = col_character(),
  ..   birthyear = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 
> str(q3_2019)
spc_tbl_ [1,640,718 Γ— 12] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ ride_id           : num [1:1640718] 23479388 23479389 23479390 23479391 23479392 ...
 $ started_at        : POSIXct[1:1640718], format: "2019-07-01 00:00:27" "2019-07-01 00:01:16" ...
 $ ended_at          : POSIXct[1:1640718], format: "2019-07-01 00:20:41" "2019-07-01 00:18:44" ...
 $ rideable_type     : num [1:1640718] 3591 5353 6180 5540 6014 ...
 $ tripduration      : num [1:1640718] 1214 1048 1554 1503 1213 ...
 $ start_station_id  : num [1:1640718] 117 381 313 313 168 300 168 313 43 43 ...
 $ start_station_name: chr [1:1640718] "Wilton Ave & Belmont Ave" "Western Ave & Monroe St" "Lakeview Ave & Fullerton Pkwy" "Lakeview Ave & Fullerton Pkwy" ...
 $ end_station_id    : num [1:1640718] 497 203 144 144 62 232 62 144 195 195 ...
 $ end_station_name  : chr [1:1640718] "Kimball Ave & Belmont Ave" "Western Ave & 21st St" "Larrabee St & Webster Ave" "Larrabee St & Webster Ave" ...
 $ member_casual     : chr [1:1640718] "Subscriber" "Customer" "Customer" "Customer" ...
 $ gender            : chr [1:1640718] "Male" NA NA NA ...
 $ birthyear         : num [1:1640718] 1992 NA NA NA NA ...
 - attr(*, "spec")=
  .. cols(
  ..   trip_id = col_double(),
  ..   start_time = col_datetime(format = ""),
  ..   end_time = col_datetime(format = ""),
  ..   bikeid = col_double(),
  ..   tripduration = col_number(),
  ..   from_station_id = col_double(),
  ..   from_station_name = col_character(),
  ..   to_station_id = col_double(),
  ..   to_station_name = col_character(),
  ..   usertype = col_character(),
  ..   gender = col_character(),
  ..   birthyear = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 
> str(q2_2019)
spc_tbl_ [1,108,163 Γ— 12] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ ride_id                                         : num [1:1108163] 22178529 22178530 22178531 22178532 22178533 ...
 $ started_at                                      : POSIXct[1:1108163], format: "2019-04-01 00:02:22" "2019-04-01 00:03:02" ...
 $ ended_at                                        : POSIXct[1:1108163], format: "2019-04-01 00:09:48" "2019-04-01 00:20:30" ...
 $ rideable_type                                   : num [1:1108163] 6251 6226 5649 4151 3270 ...
 $ 01 - Rental Details Duration In Seconds Uncapped: num [1:1108163] 446 1048 252 357 1007 ...
 $ start_station_id                                : num [1:1108163] 81 317 283 26 202 420 503 260 211 211 ...
 $ start_station_name                              : chr [1:1108163] "Daley Center Plaza" "Wood St & Taylor St" "LaSalle St & Jackson Blvd" "McClurg Ct & Illinois St" ...
 $ end_station_id                                  : num [1:1108163] 56 59 174 133 129 426 500 499 211 211 ...
 $ end_station_name                                : chr [1:1108163] "Desplaines St & Kinzie St" "Wabash Ave & Roosevelt Rd" "Canal St & Madison St" "Kingsbury St & Kinzie St" ...
 $ member_casual                                   : chr [1:1108163] "Subscriber" "Subscriber" "Subscriber" "Subscriber" ...
 $ Member Gender                                   : chr [1:1108163] "Male" "Female" "Male" "Male" ...
 $ 05 - Member Details Member Birthday Year        : num [1:1108163] 1975 1984 1990 1993 1992 ...
 - attr(*, "spec")=
  .. cols(
  ..   `01 - Rental Details Rental ID` = col_double(),
  ..   `01 - Rental Details Local Start Time` = col_datetime(format = ""),
  ..   `01 - Rental Details Local End Time` = col_datetime(format = ""),
  ..   `01 - Rental Details Bike ID` = col_double(),
  ..   `01 - Rental Details Duration In Seconds Uncapped` = col_number(),
  ..   `03 - Rental Start Station ID` = col_double(),
  ..   `03 - Rental Start Station Name` = col_character(),
  ..   `02 - Rental End Station ID` = col_double(),
  ..   `02 - Rental End Station Name` = col_character(),
  ..   `User Type` = col_character(),
  ..   `Member Gender` = col_character(),
  ..   `05 - Member Details Member Birthday Year` = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 
> q4_2019 <- mutate(q4_2019, ride_id = as.character(ride_id)
+                   ,rideable_type = as.character(rideable_type)) 
> View(q3_2019)
> View(q2_2019)
> View(q1_2020)
> View(all_trips)
Error in View : object 'all_trips' not found
> all_trips <- bind_rows(q2_2019, q3_2019, q4_2019, q1_2020)
Error in `bind_rows()`:
! Can't combine `..1$ride_id` <double> and `..3$ride_id` <character>.
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/vctrs_error_ptype2>
Error in `bind_rows()`:
! Can't combine `..1$ride_id` <double> and `..3$ride_id` <character>.
---
Backtrace:
 1. dplyr::bind_rows(q2_2019, q3_2019, q4_2019, q1_2020)
 4. vctrs::vec_rbind(!!!dots, .names_to = .id)
Run `rlang::last_trace()` to see the full context.
> View(all_trips)
Error in View : object 'all_trips' not found
> all_trips <- bind_rows(q2_2019, q3_2019, q4_2019, q1_2020)
Error in `bind_rows()`:
! Can't combine `..1$ride_id` <double> and `..3$ride_id` <character>.
Run `rlang::last_error()` to see where the error occurred.
Session restored from your saved work on 2023-Jan-22 22:09:34 UTC (17 minutes ago)

I used the dput() function...

> dput(head(q2_2019))
structure(list(ride_id = c(22178529, 22178530, 22178531, 22178532, 
22178533, 22178534), started_at = structure(c(1554076942, 1554076982, 
1554077467, 1554077581, 1554077966, 1554077979), tzone = "UTC", class = c("POSIXct", 
"POSIXt")), ended_at = structure(c(1554077388, 1554078030, 1554077719, 
1554077938, 1554078973, 1554078236), tzone = "UTC", class = c("POSIXct", 
"POSIXt")), rideable_type = c(6251, 6226, 5649, 4151, 3270, 3123
), `01 - Rental Details Duration In Seconds Uncapped` = c(446, 
1048, 252, 357, 1007, 257), start_station_id = c(81, 317, 283, 
26, 202, 420), start_station_name = c("Daley Center Plaza", "Wood St & Taylor St", 
"LaSalle St & Jackson Blvd", "McClurg Ct & Illinois St", "Halsted St & 18th St", 
"Ellis Ave & 55th St"), end_station_id = c(56, 59, 174, 133, 
129, 426), end_station_name = c("Desplaines St & Kinzie St", 
"Wabash Ave & Roosevelt Rd", "Canal St & Madison St", "Kingsbury St & Kinzie St", 
"Blue Island Ave & 18th St", "Ellis Ave & 60th St"), member_casual = c("Subscriber", 
"Subscriber", "Subscriber", "Subscriber", "Subscriber", "Subscriber"
), `Member Gender` = c("Male", "Female", "Male", "Male", "Male", 
"Male"), `05 - Member Details Member Birthday Year` = c(1975, 
1984, 1990, 1993, 1992, 1999)), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))
> dput(head(q3_2019))
structure(list(ride_id = c(23479388, 23479389, 23479390, 23479391, 
23479392, 23479393), started_at = structure(c(1561939227, 1561939276, 
1561939308, 1561939327, 1561939333, 1561939341), tzone = "UTC", class = c("POSIXct", 
"POSIXt")), ended_at = structure(c(1561940441, 1561940324, 1561940862, 
1561940830, 1561940546, 1561939651), tzone = "UTC", class = c("POSIXct", 
"POSIXt")), rideable_type = c(3591, 5353, 6180, 5540, 6014, 4941
), tripduration = c(1214, 1048, 1554, 1503, 1213, 310), start_station_id = c(117, 
381, 313, 313, 168, 300), start_station_name = c("Wilton Ave & Belmont Ave", 
"Western Ave & Monroe St", "Lakeview Ave & Fullerton Pkwy", "Lakeview Ave & Fullerton Pkwy", 
"Michigan Ave & 14th St", "Broadway & Barry Ave"), end_station_id = c(497, 
203, 144, 144, 62, 232), end_station_name = c("Kimball Ave & Belmont Ave", 
"Western Ave & 21st St", "Larrabee St & Webster Ave", "Larrabee St & Webster Ave", 
"McCormick Place", "Pine Grove Ave & Waveland Ave"), member_casual = c("Subscriber", 
"Customer", "Customer", "Customer", "Customer", "Subscriber"), 
    gender = c("Male", NA, NA, NA, NA, "Male"), birthyear = c(1992, 
    NA, NA, NA, NA, 1990)), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))
> dput(head(q4_2019))
structure(list(ride_id = c("25223640", "25223641", "25223642", 
"25223643", "25223644", "25223645"), started_at = structure(c(1569888099, 
1569888136, 1569888272, 1569888272, 1569888274, 1569888278), tzone = "UTC", class = c("POSIXct", 
"POSIXt")), ended_at = structure(c(1569889040, 1569888394, 1569889123, 
1569890623, 1569890142, 1569888651), tzone = "UTC", class = c("POSIXct", 
"POSIXt")), rideable_type = c("2215", "6328", "3003", "3275", 
"5294", "1891"), tripduration = c(940, 258, 850, 2350, 1867, 
373), start_station_id = c(20, 19, 84, 313, 210, 156), start_station_name = c("Sheffield Ave & Kingsbury St", 
"Throop (Loomis) St & Taylor St", "Milwaukee Ave & Grand Ave", 
"Lakeview Ave & Fullerton Pkwy", "Ashland Ave & Division St", 
"Clark St & Wellington Ave"), end_station_id = c(309, 241, 199, 
290, 382, 226), end_station_name = c("Leavitt St & Armitage Ave", 
"Morgan St & Polk St", "Wabash Ave & Grand Ave", "Kedzie Ave & Palmer Ct", 
"Western Ave & Congress Pkwy", "Racine Ave & Belmont Ave"), member_casual = c("Subscriber", 
"Subscriber", "Subscriber", "Subscriber", "Subscriber", "Subscriber"
), gender = c("Male", "Male", "Female", "Male", "Male", "Female"
), birthyear = c(1987, 1998, 1991, 1990, 1987, 1994)), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))
> dput(head(q1_2020))
structure(list(ride_id = c("EACB19130B0CDA4A", "8FED874C809DC021", 
"789F3C21E472CA96", "C9A388DAC6ABF313", "943BC3CBECCFD662", "6D9C8A6938165C11"
), rideable_type = c("docked_bike", "docked_bike", "docked_bike", 
"docked_bike", "docked_bike", "docked_bike"), started_at = structure(c(1579637219, 
1580394159, 1578598166, 1578327427, 1580373436, 1578659585), tzone = "UTC", class = c("POSIXct", 
"POSIXt")), ended_at = structure(c(1579637670, 1580394382, 1578598337, 
1578327956, 1580373768, 1578659874), tzone = "UTC", class = c("POSIXct", 
"POSIXt")), start_station_name = c("Western Ave & Leland Ave", 
"Clark St & Montrose Ave", "Broadway & Belmont Ave", "Clark St & Randolph St", 
"Clinton St & Lake St", "Wells St & Hubbard St"), start_station_id = c(239, 
234, 296, 51, 66, 212), end_station_name = c("Clark St & Leland Ave", 
"Southport Ave & Irving Park Rd", "Wilton Ave & Belmont Ave", 
"Fairbanks Ct & Grand Ave", "Wells St & Hubbard St", "Desplaines St & Randolph St"
), end_station_id = c(326, 318, 117, 24, 212, 96), start_lat = c(41.9665, 
41.9616, 41.9401, 41.8846, 41.8856, 41.8899), start_lng = c(-87.6884, 
-87.666, -87.6455, -87.6319, -87.6418, -87.6343), end_lat = c(41.9671, 
41.9542, 41.9402, 41.8918, 41.8899, 41.8846), end_lng = c(-87.6674, 
-87.6644, -87.653, -87.6206, -87.6343, -87.6446), member_casual = c("member", 
"member", "member", "member", "member", "member")), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

Thanks. The dput() data.frames all came through fine.

To be honest, the data layouts look just a bit messy. It is late and I will have to continue looking at things tomorrow.

1 Like

Your ride_id and rideable_type columns are the issues, e.g.

[[1]]
# A tibble: 1 x 2
  ride_id  rideable_type
  <chr>    <chr>        
1 25223640 2215         

[[2]]
# A tibble: 1 x 2
   ride_id rideable_type
     <dbl>         <dbl>
1 23479388          3591

[[3]]
# A tibble: 1 x 2
   ride_id rideable_type
     <dbl>         <dbl>
1 22178529          6251

[[4]]
# A tibble: 1 x 2
  ride_id          rideable_type
  <chr>            <chr>        
1 EACB19130B0CDA4A docked_bike  

But you could fix it using this - but maybe you're losing information in the rideable_type.

# combined ------------------
library(tidyverse)

map_df(list(q4_2019,
            q3_2019,
            q2_2019,
            q1_2020),
       ~.x %>% 
         mutate(ride_id = as.character(ride_id),
                rideable_type = as.character(rideable_type))) 



Essentially it looks like your four data sets are coming from different sources. They have different numbers of columns, often different column names or when the names are the same different data types. I renamed your data.frames to reduce typing . See the table below. Data.frame Q1 has 13 columns as opposed to the other three that have 12.
| Name | New Name | Number of columns|
|-------------- -|----------------|---------------------------|
| q1_2020 | Q1 | 13 |
| q2_2019 | Q2 | 12 |
| q3_2019 | Q3 | 12 |
| q4_2019 | Q4 | 12 |

Column names and data types

Q1

 t(as.data.frame(sapply(Q1, class)))
                                   [,1]        [,2]       
ride_id            "character" "character"
rideable_type      "character" "character"
started_at         "POSIXct"   "POSIXt"   
ended_at           "POSIXct"   "POSIXt"   
start_station_name "character" "character"
start_station_id   "numeric"   "numeric"  
end_station_name   "character" "character"
end_station_id     "numeric"   "numeric"  
start_lat          "numeric"   "numeric"  
start_lng          "numeric"   "numeric"  
end_lat            "numeric"   "numeric"  
end_lng            "numeric"   "numeric"  
member_casual      "character" "character"

Q2

t(as.data.frame(sapply(Q2, class)))
                                                                    [,1]        [,2]       
ride_id                                           "numeric"   "numeric"  
started_at                                        "POSIXct"   "POSIXt"   
ended_at                                          "POSIXct"   "POSIXt"   
rideable_type                                     "numeric"   "numeric"  
X01...Rental.Details.Duration.In.Seconds.Uncapped "numeric"   "numeric"  
start_station_id                                  "numeric"   "numeric"  
start_station_name                                "character" "character"
end_station_id                                    "numeric"   "numeric"  
end_station_name                                  "character" "character"
member_casual                                     "character" "character"
Member.Gender                                     "character" "character"
X05...Member.Details.Member.Birthday.Year         "numeric"   "numeric" 

Q3

t(as.data.frame(sapply(Q3, class)))
                   [,1]        [,2]       
ride_id            "numeric"   "numeric"  
started_at         "POSIXct"   "POSIXt"   
ended_at           "POSIXct"   "POSIXt"   
rideable_type      "numeric"   "numeric"  
tripduration       "numeric"   "numeric"  
start_station_id   "numeric"   "numeric"  
start_station_name "character" "character"
end_station_id     "numeric"   "numeric"  
end_station_name   "character" "character"
member_casual      "character" "character"
gender             "character" "character"
birthyear          "numeric"   "numeric"  

Q4

 t(as.data.frame(sapply(Q4, class)))
                   [,1]        [,2]       
ride_id            "character" "character"
started_at         "POSIXct"   "POSIXt"   
ended_at           "POSIXct"   "POSIXt"   
rideable_type      "character" "character"
tripduration       "numeric"   "numeric"  
start_station_id   "numeric"   "numeric"  
start_station_name "character" "character"
end_station_id     "numeric"   "numeric"  
end_station_name   "character" "character"
member_casual      "character" "character"
gender             "character" "character"
birthyear          "numeric"   "numeric"  

In Q1 & Q3 ride_id is numeric while in Q2 & Q4 ride_id is character.

Q1 has no entry for gender; Q2 has a *Member.Gender * and Q3 & Q4 have gender. Q1 has start_lat ,
*start_lng *, *end_lat * and end_lng, none of which appear anywhere else.

One column name in Q3, "01 - Rental Details Duration In Seconds Uncapped" has 48 characters in it, including blank spaces and no corresponding entry in the other data.frames.

As it stands, I do not see any way anyone can do something like an rbind on these data.frames.

2 Likes

Thank you so much William and John for taking the time to troubleshoot my problem. I will discuss with my classmates how to bring it up to our instructor and see how it goes. I am more excited that I got a response from a programming forum and learned a new function (dput ()).

1 Like

Hi. Just a quick update. I went through each data frame to change the data type in each column. Since the columns had to be renamed to look like q1_2020, I also changed the data types for q2_2019, q3_2019, and q4_2019 to be consistent with q1_2020. There were also columns that were not needed in the analysis that we needed to remove. So did that for each data frame. That is how it finally stacked. We had a script, but still did a lot of internet searches and googling. Thanks.

Here is the completed project https://www.kaggle.com/estherwimenje/capstoneproject-divvytrips-r

Thanks for getting bark to us. That was a lot of work but it looks good.

You may have learned a key principle of data analysis with real life data: Ninety percent of the project is data cleaning. But what you got was a real mess.

1 Like

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.