Going to a "column name" that has been calculated - "Error in if (diff >= 1) { : the condition has length >" 1

harrieteena · January 27, 2023, 9:51am

Hello team

I am quite new to coding in R and seems to have fallen into a fix for a while now.

For easiness sake, I have made a df - code below (as R does not allow me to embed more than 1 item as i'm a new joiner)

booked <- data.frame(Room=c(101, 401, 601), 
                                                              "2019-12-10", 
                                                              "2019-12-11",
                                                              "2019-12-12")

colnames(booked)<- c("Room", "2019-12-10", "2019-12-11", "2019-12-12")


data <- data.frame(Room=c(101, 401, 601), 
                                        Arrival=(c("2019-12-10", "2019-12-11",  "2019-12-12")),
                                        Departure=(c("2019-12-11",  "2019-12-13",   "2019-12-15")))

I need an output that tells me which all days were booked, something like this-
posit2

In essence, I need to first get the column name (which is a date) "calculated", but all sorts of errors pop up, including

"Error in if (diff >= 1) { : the condition has length > 1"

for(i in 1:nrow(booked)) 
  {
  for(j in 1:ncol(booked)) 
  {
    diff <- as.numeric(as.Date(data$Departure)) - as.numeric(as.Date(data$Arrival))
        if(diff >=1)
            {    y=as.numeric(as.Date(data$Arrival)) + diff -1 
                  booked$y = "booked" 
                  diff = diff-1
           }
        else
         {      y=data$Departure
                booked$y = "NA" }
         }
}}

Grateful and thank you if someone is able to figure this out!

technocrat · January 27, 2023, 10:49am

There's an idiom to error messages in R that takes a while to get attuned to. In this case it is directing attention to a "condition" in if(). The parameter in if() cond. From help(if)

A length-one logical vector that is not NA. Other types are coerced to logical if possible, ignoring any class. (As from R 4.2.0, conditions of length greater than one are an error.)

Translation: Take an object named diff and evaluate

length(diff)

If the return value is not 1 (it won't be here), the condition needs to be modified so that it is one. diff is created in the .local environment of the inner for loop based on the numeric value of the difference between two variables of the data object. To see what that does, evaluate it in the .Global environment.

dat <- data.frame(
  Room = c(101, 401, 601),
  Arrival = c("2019-12-10", "2019-12-11", "2019-12-12"),
  Departure = c("2019-12-11", "2019-12-13", "2019-12-15"))

dat |> str()
#> 'data.frame':    3 obs. of  3 variables:
#>  $ Room     : num  101 401 601
#>  $ Arrival  : chr  "2019-12-10" "2019-12-11" "2019-12-12"
#>  $ Departure: chr  "2019-12-11" "2019-12-13" "2019-12-15"
# convert character reprsentation of dates to date objects
# since it will come up frequently, make it an object

datify <- function(x,y) lubridate::ymd(x[y][[1]])

dat[2] <- datify(dat,2)
dat[3] <- datify(dat,3)
dat |> str()
#> 'data.frame':    3 obs. of  3 variables:
#>  $ Room     : num  101 401 601
#>  $ Arrival  : Date, format: "2019-12-10" "2019-12-11" ...
#>  $ Departure: Date, format: "2019-12-11" "2019-12-13" ...

(length(diff <- (dat[3] - dat[2])[[1]])) == 1
#> [1] FALSE

# create function, just for practice

is_length0 <- function(x,y,z) {
  x[y] = datify(x,y)
  x[z] = datify(x,z)
  (length(diff <- (x[z] - x[y])[[1]])) == 1
}

is_length0(dat,2,3)
#> [1] FALSE

^{Created on 2023-01-27 with reprex v2.0.2}
Note that the effect of wrapping an expression in () to immediately evaluate it.

(Now is a good time to get in the habit of avoiding built-in functions as user-created objects—data is one, and df another. It's often possible to get away with it, but sooner or later some operation is going to give priority in namespace to the function object and complain when it feels that it is being mistreated as a data object and there may be an error message to the effect that can't subset a closure.)

for loops are ok in R and sometimes convenient, with important conditions:

Unlike C and its progeny, what happens in a for loop stays there until it is returned.
It is clearer, and faster when dealing with moderately large objects to pre-allocate a receiver object outside the loop

holder <- vector(length = 1e5)
for(i in seq_along(something) holder[i] = some_function(some_arguments

vectorized equivalents exist and can be faster

apply(mtcars,1,mean)

Involved control statements can be written in Python or C/C++ and called in an R script through the {reticulate} or {Rcpp} packages.

Finally, it is profitable to think of R in terms of its original intent and continuing strength. It presents to the user as a functional rather than a procedural language using the school algebra paradigm of f(x) = y.

x is what is to hand, y is what is desired and f is a function that will transform the one into the other. Any of these objects (in R everything is an object, even functions) can be composite in the tradition of f(g(x) = y. The virtue of this approach is the focus it requires on what the objects are and do, rather than the procedural/imperative process of focusing on how to express the transformation in a stepwise manner.

nirgrahamuk · January 27, 2023, 12:32pm

Here you go; no loops !

library(tidyverse)
library(lubridate)
(initial_df <- tibble(
  Room = c(101, 401, 601),
  Arrival = (c("2019-12-10", "2019-12-11", "2019-12-12")),
  Departure = (c("2019-12-11", "2019-12-13", "2019-12-15"))
))

(df2 <- mutate(rowwise(initial_df),
  across(
    .cols = -Room,
    .fns = lubridate::ymd
  ),
  booked_dates = list(seq(
    from = Arrival, to =
      Departure, by = "1 day"
  ))
) |> ungroup())

(df3 <- select(df2, 
               Room,
               booked_dates) |>
  unnest(cols = booked_dates) |> 
    mutate(dmy = "Booked"))

(df4 <- pivot_wider(df3,
  id_cols = "Room", 
  names_from = "booked_dates",
  values_from = "dmy"
))

# A tibble: 3 × 7
   Room `2019-12-10` `2019-12-11` `2019-12-12` `2019-12-13` 2019-12…¹ 2019-…²
  <dbl> <chr>        <chr>        <chr>        <chr>        <chr>     <chr>  
1   101 Booked       Booked       NA           NA           NA        NA     
2   401 NA           Booked       Booked       Booked       NA        NA     
3   601 NA           NA           Booked       Booked       Booked    Booked 
# … with abbreviated variable names ¹`2019-12-14`, ²`2019-12-15`

harrieteena · February 2, 2023, 11:02am

As it probably is evident, I indeed used to code using the C family of languages quite a while back. Being back to coding itself has been....a task. Learning a new language altogether has been whooping my ass in all honesty

People were not kidding when they said the R community is the sweetest! Thank you for these valuable pointers. I am not in a position to understand your reply yet, but hopefully one day soon.

harrieteena · February 7, 2023, 10:05am

Thank you so much! Honestly, I'm just days old in using R and you have guided me to a plethora of functions and libraries already.

I am still wrapping my head around this, but from what I understand, this puts all dates from arrival to departure as "booked". The loop function was introduced cause I do need all days from arrival to departure as "booked" EXCEPT for the departure date (as shown in the original question).

Also probably silly, but since I see "tibble" - does this work for a larger data set, somewhere around 5000 rows?

nirgrahamuk · February 7, 2023, 5:39pm

I'm fairly confident that you could subtract 1 from the departure date if you didnt want this to count in the sequnce of booked dates.

booked_dates <- list(seq(
  from = Arrival,
  to   = Departure - 1,
  by   = "1 day"
))

system · February 28, 2023, 5:39pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.