Column `x` is of unsupported type quoted call

In a large dataframe ("data") with four columns I have to add another column ("Wait") with values conditionally based on the first four columns. I get the "Error: Column Wait is of unsupported type quoted call"

I will Prefer answers with dplyr and mutate and Case When.

My dataframe looks like this:

#Creating a sample dataset
#A = Datetime
#a, b and c are error codes
# 0 means successful
data <- data.frame(a = c("2018-06-01 09:00:17",
                         "2018-06-01 20:31:54",
                         "2018-06-01 20:32:49", 
                         "2018-06-01 20:34:45",
                         "2018-06-02 12:28:27",
                         "2018-06-04 22:01:58",
                         "2018-06-05 01:33:03",
                         "2018-06-05 01:33:12"),
                   b = c(0, 0, 91, 0, 522, 0,501, 501),
                   c = c(511, 250, 250, 250, 501, 511, 501, 501),
                   d = c(0, 522, 0, 559, 200, 0, 501, 501),
                   stringsAsFactors = FALSE) 

#Change character to Datetime
data$a <- as_datetime(data$a, tz="GMT")

This is what i have written (i am not an expert in r):

#Error Code Vector

V <- c(91,200,250,501,511,522)

#Adding DateShift Column
data<- data %>% 
  arrange((a))%>%
  mutate(DateShift = lag(a))
data

#Calculating the WaitSecs
data <- data %>%
  mutate(Wait = case_when(b == 0 & c %in% V & d != c & d != b) ~ 1,
         (b == 0 & c %in% V & d != c & d != b) ~ 1,
         (b %in% V & c != b & d == 0) ~ 1,
         (b %in% V & c == b & d != c) ~ a - DateShift,
         TRUE ~ 0)

I get the error below:

Error: Column Wait is of unsupported type quoted call

Expected Result Should be:

data <- data.frame(a = c("2018-06-01 09:00:17",
                         "2018-06-01 20:31:54",
                         "2018-06-01 20:32:49", 
                         "2018-06-01 20:34:45",
                         "2018-06-02 12:28:27",
                         "2018-06-04 22:01:58",
                         "2018-06-05 01:31:34",
                         "2018-06-05 01:31:46",
                         "2018-06-05 01:33:03",
                         "2018-06-05 01:33:03",
                         "2018-06-05 01:33:12"),
                   b = c(0, 0, 91, 0, 522, 0, 0, 501, 0, 501, 501),
                   c = c(511, 250, 250, 250, 501, 511, 501, 501, 501, 501, 501),
                   d = c(0, 522, 0, 559, 200, 0, 501, 0, 501, 501,0),
                   DateShift = c(NA,
                     "2018-06-01 09:00:17",
                         "2018-06-01 20:31:54",
                         "2018-06-01 20:32:49", 
                         "2018-06-01 20:34:45",
                         "2018-06-02 12:28:27",
                         "2018-06-04 22:01:58",
                         "2018-06-05 01:31:34",
                         "2018-06-05 01:31:46",
                         "2018-06-05 01:33:03",
                         "2018-06-05 01:33:03"),
                   Wait = c(1, 1, 1, 1, 1, 1, 0, 12, 0, 0, 9),
                   stringsAsFactors = FALSE) 
data 

I need help to resolve this. I am also open to other suggestions on how to achieve my result.

Welcome to the community!

For your future questions, please provide a REPRoducible EXample of your problem. If you don't know how to do it, take a look at this thread.

One problem with your code is that there are problems with parentheses. Also, if you use case_when, all RHS must be of same type. You can't have some of difftime and some of numeric.

Please note that your expected output contains some extra rows compared to the provided input. In my answer below, I've considered all rows in the expected dataset. But it doesn't match your expectation. You want 1 in 1^{st}, 5^{th} and 6^{th} rows. But for 1^{st} and 6^{th} rows, d = b. Hence, they violate the condition: (b == 0 & c %in% V & d != c & d != b). For 5^{th} row, d \neq 0 and hence violate (b %in% V & c != b & d == 0). Please check.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date

dataset <- tibble(a = as_datetime(x = c("2018-06-01 09:00:17", "2018-06-01 20:31:54", "2018-06-01 20:32:49", "2018-06-01 20:34:45", "2018-06-02 12:28:27", "2018-06-04 22:01:58", "2018-06-05 01:31:34", "2018-06-05 01:31:46", "2018-06-05 01:33:03", "2018-06-05 01:33:03", "2018-06-05 01:33:12"),
                                  tz = "GMT"),
                  b = c(0, 0, 91, 0, 522, 0, 0, 501, 0, 501, 501),
                  c = c(511, 250, 250, 250, 501, 511, 501, 501, 501, 501, 501),
                  d = c(0, 522, 0, 559, 200, 0, 501, 0, 501, 501,0))

V <- c(91, 200, 250, 501, 511, 522)

dataset %>%
  mutate(DateShift = lag(x = a),
         Wait = case_when((((b == 0) & (c %in% V) & (d != c) & (d != b)) | ((b %in% V) & (c != b) & (d == 0))) ~ as.difftime(tim = 1,
                                                                                                                             units = "secs"),
                          ((b %in% V) & (c == b) & (d != c)) ~ difftime(time1 = a,
                                                                        time2 = DateShift,
                                                                        units = "secs"),
                          TRUE ~ as.difftime(tim = 0,
                                             units = "secs")))
#> # A tibble: 11 x 6
#>    a                       b     c     d DateShift           Wait   
#>    <dttm>              <dbl> <dbl> <dbl> <dttm>              <drtn> 
#>  1 2018-06-01 09:00:17     0   511     0 NA                   0 secs
#>  2 2018-06-01 20:31:54     0   250   522 2018-06-01 09:00:17  1 secs
#>  3 2018-06-01 20:32:49    91   250     0 2018-06-01 20:31:54  1 secs
#>  4 2018-06-01 20:34:45     0   250   559 2018-06-01 20:32:49  1 secs
#>  5 2018-06-02 12:28:27   522   501   200 2018-06-01 20:34:45  0 secs
#>  6 2018-06-04 22:01:58     0   511     0 2018-06-02 12:28:27  0 secs
#>  7 2018-06-05 01:31:34     0   501   501 2018-06-04 22:01:58  0 secs
#>  8 2018-06-05 01:31:46   501   501     0 2018-06-05 01:31:34 12 secs
#>  9 2018-06-05 01:33:03     0   501   501 2018-06-05 01:31:46  0 secs
#> 10 2018-06-05 01:33:03   501   501   501 2018-06-05 01:33:03  0 secs
#> 11 2018-06-05 01:33:12   501   501     0 2018-06-05 01:33:03  9 secs

Created on 2019-07-10 by the reprex package (v0.3.0)

Hope this helps.

2 Likes

Super!
I acknowledge the errors you observed. Your corrections are accurate.
Thank you so much Yarnabrina. That was creative problem solving. I really do appreciate.

Hi Yarnabrina,

Permit me to get one more clarification using the sample dataset you created: I modified it slightly (i changed "2018-06-05 01:33:03" to  "2018-06-05 01:32:01")

dataset <- tibble(a = as_datetime(x = c("2018-06-01 09:00:17", "2018-06-01 20:31:54", "2018-06-01 20:32:49", "2018-06-01 20:34:45", "2018-06-02 12:28:27", "2018-06-04 22:01:58", "2018-06-05 01:31:34", "2018-06-05 01:31:46", "2018-06-05 01:32:01", "2018-06-05 01:33:03", "2018-06-05 01:33:12"),
                                  tz = "GMT"),
                  b = c(0, 0, 91, 0, 522, 0, 0, 501, 0, 501, 501),
                  c = c(511, 250, 250, 250, 501, 511, 501, 501, 501, 501, 501),
                  d = c(0, 522, 0, 559, 200, 0, 501, 0, 501, 501,0))

I tried doing the below (added another case_when within a case_when not exactly sure if that is possible)

dataset %>%
  mutate(DateShift = lag(x = a),
         Wait = case_when((((b == 0) & (c %in% V) & (d != c) & (d != b)) | ((b %in% V) & (c != b) & (d == 0))) ~ as.difftime(tim = 1,
                                                                                                                             units = "secs"),
                          ((b %in% V) & (c == b) & (d != c)) ~ difftime(time1 = case_when(b == 0) & (c %in% V) & (d == c) ~ a,
                                                                                                                    TRUE ~ 0,
                                                                        time2 = DateShift,
                                                                        units = "secs"),
                          TRUE ~ as.difftime(tim = 0,
                                             units = "secs")))

I got this error below:

Error in as.POSIXct.default(time1, tz = tz) :
do not know how to convert 'time1' to class “POSIXct”

What i expect:
For this case,
earlier_time = "2018-06-05 01:32:01"
recent_time = "2018-06-05 01:33:12"

difftime(recent_time,earlier_time,tz="GMT",units="secs")
#71secs

So instead of 9secs, it should be 71secs

Hi Yarnabrin,

Please find below:

# Scenario Wait
1 b = 0, d = 0, c belongs to V 1
2 b = 0, c belongs to V, (d doesn't match b or c) 1
3 b belongs to V, (c doesn't match b), d = 0 1
4 b belongs to V, (b, c, d are distinct) 1
5 b belongs to V, c = b, (d doesn't match b or c) time difference of 5 & 6
6 (b = 0, c belongs to V, d = c) 0
7 Else 0

I'm really sorry, but couldn't follow this. What are 4 and 5?

I'm giving you the solution with this part being substituted by time difference of a and DateShift. Please modify it for your problem:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date

dataset <- tibble(a = as_datetime(x = c("2018-06-01 09:00:17", "2018-06-01 20:31:54", "2018-06-01 20:32:49", "2018-06-01 20:34:45", "2018-06-02 12:28:27", "2018-06-04 22:01:58", "2018-06-05 01:31:34", "2018-06-05 01:31:46", "2018-06-05 01:32:01", "2018-06-05 01:33:03", "2018-06-05 01:33:12"),
                                  tz = "GMT"),
                  b = c(0, 0, 91, 0, 522, 0, 0, 501, 0, 501, 501),
                  c = c(511, 250, 250, 250, 501, 511, 501, 501, 501, 501, 501),
                  d = c(0, 522, 0, 559, 200, 0, 501, 0, 501, 501,0))

V <- c(5, 91, 200, 201, 202, 203, 230, 250, 501, 502, 511, 522, 524, 525, 539, 559)

dataset %>%
  mutate(DateShift = lag(x = a),
         Wait = case_when(((b == 0) & (c %in% V) & (d == 0)) ~ as.difftime(tim = 1, units = "secs"),
                          ((b == 0) & (c %in% V) & (b != d) & (b != c)) ~ as.difftime(tim = 1, units = "secs"),
                          ((b %in% V) & (b != c) & (d == 0)) ~ as.difftime(tim = 1, units = "secs"),
                          ((b %in% V) & (b != c) & (b != d) & (c != d)) ~ as.difftime(tim = 1, units = "secs"),
                          ((b %in% V) & (b == c) & (b != d) & (c != d)) ~ difftime(time1 = a, time2 = DateShift, units = "secs"),
                          ((b == 0) & (c %in% V) & (c == d)) ~ as.difftime(tim = 0, units = "secs"),
                          TRUE ~ as.difftime(tim = 0, units = "secs")))
#> # A tibble: 11 x 6
#>    a                       b     c     d DateShift           Wait   
#>    <dttm>              <dbl> <dbl> <dbl> <dttm>              <drtn> 
#>  1 2018-06-01 09:00:17     0   511     0 NA                   1 secs
#>  2 2018-06-01 20:31:54     0   250   522 2018-06-01 09:00:17  1 secs
#>  3 2018-06-01 20:32:49    91   250     0 2018-06-01 20:31:54  1 secs
#>  4 2018-06-01 20:34:45     0   250   559 2018-06-01 20:32:49  1 secs
#>  5 2018-06-02 12:28:27   522   501   200 2018-06-01 20:34:45  1 secs
#>  6 2018-06-04 22:01:58     0   511     0 2018-06-02 12:28:27  1 secs
#>  7 2018-06-05 01:31:34     0   501   501 2018-06-04 22:01:58  1 secs
#>  8 2018-06-05 01:31:46   501   501     0 2018-06-05 01:31:34 12 secs
#>  9 2018-06-05 01:32:01     0   501   501 2018-06-05 01:31:46  1 secs
#> 10 2018-06-05 01:33:03   501   501   501 2018-06-05 01:32:01  0 secs
#> 11 2018-06-05 01:33:12   501   501     0 2018-06-05 01:33:03  9 secs

Note that the 1st 4 cases can be combined, and same for cases 6 and 7.

1 Like

Hi,

You are almost there. See new comments below:

Here's my final attempt:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date

dataset <- tibble(a = as_datetime(x = c("2018-06-01 09:00:17", "2018-06-01 20:31:54", "2018-06-01 20:32:49", "2018-06-01 20:34:45", "2018-06-02 12:28:27", "2018-06-04 22:01:58", "2018-06-05 01:31:34", "2018-06-05 01:31:46", "2018-06-05 01:32:01", "2018-06-05 01:33:03", "2018-06-05 01:33:12"),
                                  tz = "GMT"),
                  b = c(0, 0, 91, 0, 522, 0, 0, 501, 0, 501, 501),
                  c = c(511, 250, 250, 250, 501, 511, 501, 501, 501, 501, 501),
                  d = c(0, 522, 0, 559, 200, 0, 501, 0, 501, 501,0))

V <- c(5, 91, 200, 201, 202, 203, 230, 250, 501, 502, 511, 522, 524, 525, 539, 559)

dataset %>%
  mutate(DateShift = lag(x = a),
         CurrentRepeatLength = sequence(nvec = rle(x = b)$lengths),
         CorrespondingDateShift = DateShift[row_number() - CurrentRepeatLength + 1],
         Wait = case_when(((b == 0) & (c %in% V) & (d == 0)) ~ as.difftime(tim = 1, units = "secs"),
                          ((b == 0) & (c %in% V) & (b != d) & (b != c)) ~ as.difftime(tim = 1, units = "secs"),
                          ((b %in% V) & (b != c) & (d == 0)) ~ as.difftime(tim = 1, units = "secs"),
                          ((b %in% V) & (b != c) & (b != d) & (c != d)) ~ as.difftime(tim = 1, units = "secs"),
                          ((b %in% V) & (b == c) & (b != d) & (c != d)) ~ difftime(time1 = a, time2 = CorrespondingDateShift, units = "secs"),
                          ((b == 0) & (c %in% V) & (c == d)) ~ as.difftime(tim = 0, units = "secs"),
                          TRUE ~ as.difftime(tim = 0, units = "secs"))) %>%
  select(-CurrentRepeatLength, -CorrespondingDateShift)
#> # A tibble: 11 x 6
#>    a                       b     c     d DateShift           Wait   
#>    <dttm>              <dbl> <dbl> <dbl> <dttm>              <drtn> 
#>  1 2018-06-01 09:00:17     0   511     0 NA                   1 secs
#>  2 2018-06-01 20:31:54     0   250   522 2018-06-01 09:00:17  1 secs
#>  3 2018-06-01 20:32:49    91   250     0 2018-06-01 20:31:54  1 secs
#>  4 2018-06-01 20:34:45     0   250   559 2018-06-01 20:32:49  1 secs
#>  5 2018-06-02 12:28:27   522   501   200 2018-06-01 20:34:45  1 secs
#>  6 2018-06-04 22:01:58     0   511     0 2018-06-02 12:28:27  1 secs
#>  7 2018-06-05 01:31:34     0   501   501 2018-06-04 22:01:58  1 secs
#>  8 2018-06-05 01:31:46   501   501     0 2018-06-05 01:31:34 12 secs
#>  9 2018-06-05 01:32:01     0   501   501 2018-06-05 01:31:46  1 secs
#> 10 2018-06-05 01:33:03   501   501   501 2018-06-05 01:32:01  0 secs
#> 11 2018-06-05 01:33:12   501   501     0 2018-06-05 01:33:03 71 secs

I'll explain what I have done based on my understanding (which is questionable) below. If I failed to understand even this time, I give up. Sorry.

  1. I calculated DateShift as the simple lag of 1.
  2. I checked whether the current value of b is the consecutive first occurrence, or is it a repetition of the last row. I stored current run length in CurrentRepeatLength.
  3. I created the column CorrespondingDateShift, which takes the last value in the DateShift column for which b is not a repetition of previous row.
  4. I calculated Wait as earlier, but instead of checking time difference with DateShift, I did it with CorrespondingDateShift.
1 Like

Wow Yarnabrina. I am impressed. I hope to be as good as you are sometime soon.

Regards