Why can I not mutate a data frame column, but I can add it individually just fine?

There's probably a good reason for this, but I'm pretty new to R and it's beyond me!

I have a data frame with two datetime columns, start and end.

I was able to build a mutate chain to extract various bits as such:

started <- cyc_trips$started_at
ended <- cyc_trips$ended_at

cyc_trips <- cyc_trips %>% mutate(
started_date = as.Date (started),
started_day = wday(started, label = TRUE),
started_month = month(started, label = TRUE),
started_time = as_hms(started),
started_hour = hour(started),
started_min = minute(started),
ended_time = as_hms(ended),
ended_hour = hour(ended),
ended_min = minute(ended),
trip_dur_sec = duration(as.numeric(ended - started))
)

When I add the line below to the chain, I get an error and warning.

trip_dur_per = as.period(as.duration(interval(started, ended)))
+ )
Error in dimnames(x) <- dn :
length of 'dimnames' [1] not equal to array extent
In addition: Warning message:
In cbind(ride_id = c("4CA9676997DAFFF6", "F3E84A230AF2D676", "A1F2C92308007968",  :
number of rows of result is not a multiple of vector length (arg 1)

And yet, it works just fine when I create that column individually with the line below. I don't understand why there'd be a discrepancy in the vector length. I dropped all rows with NA and empty observations earlier so I don't suspect that to be the case.

cyc_trips$trip_dur_per <- as.period(as.duration(interval(started, ended)))

Can anyone explain what's going on? Thank you!

Probably need to use rowwise() on your frame being mutated so the mutations are strictly per row

Just curious, does this work? (replacing variable names to those found in the dataset, and not global variables)

cyc_trips %>% mutate(
started_date = as.Date (started_at),
started_day = wday(started_at, label = TRUE),
started_month = month(started_at, label = TRUE),
started_time = as_hms(started_at),
started_hour = hour(started_at),
started_min = minute(started_at),
ended_time = as_hms(ended_at),
ended_hour = hour(ended_at),
ended_min = minute(ended_at),
trip_dur_sec = duration(as.numeric(ended_at - started_at)),
trip_dur_per = as.period(as.duration(interval(started_at, ended_at)))
)

It doesn't look like it, I receive the same error/warning

> cyc_trips %>% mutate (trip_dur_per = as.period(as.duration(interval(
+   started_at, ended_at ))))
Error in dimnames(x) <- dn : 
  length of 'dimnames' [1] not equal to array extent
In addition: Warning message:
In cbind(ride_id = c("4CA9676997DAFFF6", "F3E84A230AF2D676", "A1F2C92308007968",  :
  number of rows of result is not a multiple of vector length (arg 1)

I haven't used rowwise() before, would this be the correct syntax?
(trying to follow the dplyr reference

> cyc_trips %>% rowwise() %>% mutate (trip_dur_per = as.period(as.duration(interval(
+   started_at, ended_at
+ ))))

If that's the case, I'm not even getting an error message, it just hangs. When I run that command outside of the mutation chain, it's pretty quick.

It looks correct to the eye.
I'm willing to look into further solutions for you if you were to provide some reprex.

I've never made a reprex so thank you for that lesson! Of course it also works in the reprex with the small sample of data.

library(tidyverse)
library(lubridate)
library(hms)

ride_id = c(
  "CFDB3C60259ACF09",
  "0544777CC1CFDDAF",
  "561E1D60E437D366",
  "9E0405472BD807AF",
  "5E8CA9E4B7FEE1DE"
)
started_at = as.POSIXct(
  c(
    "2022-08-09 19:25:20",
    "2022-09-18 15:35:43",
    "2022-08-16 20:58:41",
    "2022-06-11 16:40:22",
    "2022-05-20 20:54:26"
  )
)
ended_at = as.POSIXct(
  c(
    "2022-08-09 20:17:29",
    "2022-09-18 15:49:31",
    "2022-08-16 21:07:09",
    "2022-06-11 16:45:18",
    "2022-05-20 21:02:27"
  )
)
cyc_trips_sample <- data.table::data.table(ride_id, started_at, ended_at)

started <- cyc_trips_sample$started_at
ended <- cyc_trips_sample$ended_at

cyc_trips_sample <- cyc_trips_sample %>% mutate(
  started_date = as.Date (started),
  started_day = wday(started, label = TRUE),
  started_month = month(started, label = TRUE),
  started_time = as_hms(started),
  started_hour = hour(started),
  started_min = minute(started),
  ended_time = as_hms(ended),
  ended_hour = hour(ended),
  ended_min = minute(ended),
  trip_dur_sec = duration(as.numeric(ended - started)),
  trip_dur_per = as.period(as.duration(interval(started, ended)))
)

So somewhere in millions of rows there's a goof, but with millions of entries I have no idea how to find it.
Here's how I got rid of my empty observations and NA's.

cyc_trips <- cyc_trips %>% 
  mutate_if(is.character, ~na_if(.,""))

cyc_trips <- na.omit(cyc_trips)

Any ideas on how I could find / dump the problematic rows?

I think your error mentioned some ride_id's , Id try to find them as first idea

I thought that too and tried to find the culprit, but the code be;pw should have found anything with an empty space and converted it to an NA since ride_id is a character column. That command looks for "" and I reran it with an increasing amount of empty spaces. Checking with the sapply column below with every new space up to 20 or so, it kept not finding any rows with NAs in either of the columns. :face_with_spiral_eyes:

cyc_trips <- cyc_trips %>% 
  mutate_if(is.character, ~na_if(.,""))

cyc_trips <- na.omit(cyc_trips)
sapply(cyc_trips, function(x)
sum(is.na(x)))

When you filter your data for the ride ids that show up in the error, is there any different at all with those IDs or associated columns?

Im not sure why we are talking about ids with spaces.

Oh sorry, the reason I brought that up is that omit.na would have dropped them if they were NAs, but it won't drop empty spaces. the mutate_if command converts empty spaces into NAs so the rows can be dropped by omit.na.

I discovered empty spaces in some other columns that weren't getting dropped in a different part of cleaning up this data set and found that to be a quick way to get rid of them.

I just tried that. The three entries identified in the warning all have complete information.

If you include them along with a few other records in a reprex, does it show the error or not ?

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.