I am storing two types of dataframes on a webserver and there is a cronjob which puts the data there. It follows the format of something similar like data_a_1.csv, data_a_2.csv, data_b_1.csv, data_b_2.csv etc. Sometimes the cronjob fails which is not the biggest issue but in my R-code I am looping over the data and generate new variables from them. I do it currently such that I import the datasets, do all the important calculations (and save some variables to arrays outside the loop) and overwrite the old dataframes with the new one in the next step of the loop.
I do something like:
for(i in 1:10){
data_a <- read.csv(url(paste("the_web_address.com", "_data_a_", x[i], ".csv", sep=(""))))
data_b <- read.csv(url(paste("the_web_address.com", "_data_b_", x[i], ".csv", sep=(""))))
// do some calculations here
}
where x just holds the info on the number of data.
So what could happen is that I have some data missing (either of a, b or both). So let's say data_a and data_b is available for 1-8, then 9 is missing and 10 is available again. So, obviously, R will give me an error (open.connection) and not do the calculation. What I would like it to do is if a url is not available, just use the dataset from the previously available dataframe and continue all calculations. We would need to account for the fact that maybe data_a_5 is missing but data_b_5 is available.
I guess a TryCatch could be useful? But would I need to make an individual TryCatch for both types for datasets (to account for the issue mentioned above)?
rather than tryCatch, you can use the simpler try, to silently skip problematic portions of loops.
Here is hopefully an easy to follow demo
stopOnFalse <- function(x){
if(isFALSE(x))
stop("got a problem here")
}
(vector_to_process1 <- rep(TRUE,10))
(vector_to_process2 <- vector_to_process1)
#poison the vectors
vector_to_process1[[5]] <- FALSE
vector_to_process2[[5]] <- FALSE
for(i in 1:10){
stopOnFalse(vector_to_process1[[i]])
vector_to_process1[[i]] <- NA
}
vector_to_process1
# [1] NA NA NA NA FALSE TRUE TRUE TRUE TRUE TRUE
for(i in 1:10){
try({
stopOnFalse(vector_to_process2[[i]])
vector_to_process2[[i]] <- NA
},silent = TRUE
)
}
vector_to_process2
#[1] NA NA NA NA FALSE NA NA NA NA NA
what you shared above has a syntax error as both times you have the_web_address.com, it has unterminated quotes, be careful of that
ah sorry, I adjusted the example and changed the names, I don't have the unterminated quotes in the real program.
I'll check out your example.
Edit: Your example just silences the errors. But I want to do an alternative execution if an error occurs, so I want to "catch" the error and execute the command to use the old dataset on for all the following code instead.
So, I think my idea is not too far from what I want, right? I just don't get it to work yet. I still get the "error in open.connection" problem - which is the same as if I am not using TryCatch
Let's stick to the example with one dataset. Imagine I have data_1, data_2, data_5, data_6
What I tried so far:
for(i in 1:length(x)) {
tryCatch(
data <- read.csv(url(paste("the_web_address.com", "_data_", x[i], ".csv", sep=("")))),
error = function(e){
data = data # use the dataset from the previous iteration which should still be in memory
print("There was an error")
data_missing = data_missing + 1
})
# Do useful stuff here
}
Currently that does not really work. It parses data_1 and data_2 then prints the error message "There was an Error" two times (for the missing data_3 and data_4) and then does not continue with calculations of data_5 and data_6. Also, the counter missing_data is not incremented.
the error function is its own name space, so assignment is happening to variables in that scope only.
in general dont use = when you can use <- , and when you want global assignment rather than function scope assignment use <<-
for(i in 1:length(x)) {
tryCatch(
data <- read.csv(url(paste("the_web_address.com", "_data_", x[i], ".csv", sep=("")))),
error = function(e){
data <<- data # use the dataset from the previous iteration which should still be in memory
print("There was an error")
data_missing <<- data_missing + 1
})
# Do useful stuff here
}
Edit: Probably the data <-- data is redundant in that mini-example but in my real example I have to alter the index within the data, so I write something like