TryCatch with Loop

Hey guys

I am storing two types of dataframes on a webserver and there is a cronjob which puts the data there. It follows the format of something similar like data_a_1.csv, data_a_2.csv, data_b_1.csv, data_b_2.csv etc. Sometimes the cronjob fails which is not the biggest issue but in my R-code I am looping over the data and generate new variables from them. I do it currently such that I import the datasets, do all the important calculations (and save some variables to arrays outside the loop) and overwrite the old dataframes with the new one in the next step of the loop.

I do something like:

for(i in 1:10){
 data_a <- read.csv(url(paste("the_web_address.com", "_data_a_", x[i], ".csv", sep=(""))))
 data_b <- read.csv(url(paste("the_web_address.com", "_data_b_", x[i], ".csv", sep=(""))))

// do some calculations here
}

where x just holds the info on the number of data.

So what could happen is that I have some data missing (either of a, b or both). So let's say data_a and data_b is available for 1-8, then 9 is missing and 10 is available again. So, obviously, R will give me an error (open.connection) and not do the calculation. What I would like it to do is if a url is not available, just use the dataset from the previously available dataframe and continue all calculations. We would need to account for the fact that maybe data_a_5 is missing but data_b_5 is available.

I guess a TryCatch could be useful? But would I need to make an individual TryCatch for both types for datasets (to account for the issue mentioned above)?

You're definitely onto something :smiley:

okay, cool. So actually I have two types of datasets in the webserver, so let me show you my idea (it does not work currently):

for(i in 1:10){
tryCatch(
data_type_a <- read.csv(url(paste("the_web_address.com", "data_type_a_", x[i], ".csv", sep=("")))),
data_type_b <- read.csv(url(paste("the_web_address.com", "data_type_b_", x[i], ".csv", sep=("")))),
error = function(e){
    data_type_a = data_type_a
    data_type_b = data_type_b
    })
// do some calculations here
}

Does that work in principle? I am aware that there needs to be a data_type_a_1 and data_type_b_1 to start with, but that would be fine

rather than tryCatch, you can use the simpler try, to silently skip problematic portions of loops.
Here is hopefully an easy to follow demo


stopOnFalse <- function(x){
  if(isFALSE(x))
    stop("got a problem here")
}

(vector_to_process1 <- rep(TRUE,10))
(vector_to_process2 <- vector_to_process1)
#poison the vectors 
  vector_to_process1[[5]] <- FALSE
  vector_to_process2[[5]] <- FALSE
  
for(i in 1:10){
  stopOnFalse(vector_to_process1[[i]])
  vector_to_process1[[i]] <- NA
}
vector_to_process1
# [1]    NA    NA    NA    NA FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
for(i in 1:10){
  try({
    stopOnFalse(vector_to_process2[[i]])
    vector_to_process2[[i]] <- NA
  },silent = TRUE
      )
}
vector_to_process2
#[1]    NA    NA    NA    NA FALSE    NA    NA    NA    NA    NA

what you shared above has a syntax error as both times you have the_web_address.com, it has unterminated quotes, be careful of that

ah sorry, I adjusted the example and changed the names, I don't have the unterminated quotes in the real program.

I'll check out your example.

Edit: Your example just silences the errors. But I want to do an alternative execution if an error occurs, so I want to "catch" the error and execute the command to use the old dataset on for all the following code instead.

So, I think my idea is not too far from what I want, right? I just don't get it to work yet. I still get the "error in open.connection" problem - which is the same as if I am not using TryCatch

I updated the original post to account for the two data types. I still didn't figure it out :frowning:

That does imply that you would have repeated data ?



stopOn5 <- function(x){
  if(x==5)
    stop("got a problem here")
}

(vector_to_process2  <- vector_to_process1 <-1:10)

#intend to make the numbers negative in the loop

for(i in 1:10){
  stopOn5(vector_to_process1[[i]])
  vector_to_process1[[i]] <- - vector_to_process1[[i]]
}
vector_to_process1
# [1] -1 -2 -3 -4  5  6  7  8  9 10

for(i in 1:10){
  tryCatch(expr = {
    stopOn5(vector_to_process2[[i]])
    vector_to_process2[[i]] <- - vector_to_process2[[i]]
  },
   error = function(e) {
     cat("fixing ",e$message,"\n")
     vector_to_process2[[i]] <<- vector_to_process2[[i-1]]  #assign alternate behaviour slightly hacky as using global assign
   }
  
  )
}

#fixing  got a problem here 
vector_to_process2
 #[1]  -1  -2  -3  -4  -4  -6  -7  -8  -9 -10

Oh, yes sorry. The Data has the same structure in every dataframe. Basically it is a timeseries

Hello guys

Let's stick to the example with one dataset. Imagine I have data_1, data_2, data_5, data_6

What I tried so far:

for(i in 1:length(x)) {
  
  tryCatch(
    data <- read.csv(url(paste("the_web_address.com", "_data_", x[i], ".csv", sep=("")))),
    error = function(e){
      data = data # use the dataset from the previous iteration which should still be in memory
      print("There was an error")
      data_missing = data_missing + 1
    })

# Do useful stuff here
}

Currently that does not really work. It parses data_1 and data_2 then prints the error message "There was an Error" two times (for the missing data_3 and data_4) and then does not continue with calculations of data_5 and data_6. Also, the counter missing_data is not incremented.

Does anybody know a fix?

the error function is its own name space, so assignment is happening to variables in that scope only.
in general dont use = when you can use <- , and when you want global assignment rather than function scope assignment use <<-

Works nicely thanks!

So the solution to my problem is:

for(i in 1:length(x)) {
  
  tryCatch(
    data <- read.csv(url(paste("the_web_address.com", "_data_", x[i], ".csv", sep=("")))),
    error = function(e){
      data <<- data # use the dataset from the previous iteration which should still be in memory
      print("There was an error")
      data_missing <<- data_missing + 1
    })

# Do useful stuff here
}

Edit: Probably the data <-- data is redundant in that mini-example but in my real example I have to alter the index within the data, so I write something like

data$t = data$t + 1

Glad, you reached a conclusion :slight_smile:

using equal signs is a hard habit to break but worth it.
I'm not alone in recommending it.
https://style.tidyverse.org/syntax.html#assignment-1
http://web.stanford.edu/class/cs109l/unrestricted/resources/google-style.html

okay, I try to use <- in the future :slight_smile:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.