Hey guys
I am storing two types of dataframes on a webserver and there is a cronjob which puts the data there. It follows the format of something similar like data_a_1.csv, data_a_2.csv, data_b_1.csv, data_b_2.csv etc. Sometimes the cronjob fails which is not the biggest issue but in my R-code I am looping over the data and generate new variables from them. I do it currently such that I import the datasets, do all the important calculations (and save some variables to arrays outside the loop) and overwrite the old dataframes with the new one in the next step of the loop.
I do something like:
for(i in 1:10){
data_a <- read.csv(url(paste("the_web_address.com", "_data_a_", x[i], ".csv", sep=(""))))
data_b <- read.csv(url(paste("the_web_address.com", "_data_b_", x[i], ".csv", sep=(""))))
// do some calculations here
}
where x just holds the info on the number of data.
So what could happen is that I have some data missing (either of a, b or both). So let's say data_a and data_b is available for 1-8, then 9 is missing and 10 is available again. So, obviously, R will give me an error (open.connection) and not do the calculation. What I would like it to do is if a url is not available, just use the dataset from the previously available dataframe and continue all calculations. We would need to account for the fact that maybe data_a_5 is missing but data_b_5 is available.
I guess a TryCatch could be useful? But would I need to make an individual TryCatch for both types for datasets (to account for the issue mentioned above)?