While Loop W/ Nested If Else


#1

Hi All. Can't figure out what's quite wrong with my loop and conditionals. Basically I want to check to see if a file exists, if it does, read it in and send to a database. There are 3 files and I want to load all 3. If I have not yet loaded all 3, I want it to keep running till the job is done. Here's my code.


x <- 0
while(x != 3){
    if(file.exists(file1.csv)) {
        file1<- fread("file1.csv", stringsasfactors = FALSE)
        sqlSave(con, file1, tablename = 'dbo.file1', colnames = FALSE, rownames = FALSE, append = TRUE)
        x = x + 1
    } else if(file.exists(file2.csv)) {
        file2<- fread("file2.csv", stringsasfactors = FALSE)
        sqlSave(con, file2, tablename = "dm.file2", colnames = FALSE, rownames = FALSE, append = TRUE)
        x = x + 1
    } else if(file.exists(file3.csv)) {
        file3<- fread("file3.csv", stringsasfactors = FALSE)
        sqlSave(con, file3, tablename = "dm.file3", colnames = FALSE, rownames = FALSE, append = TRUE)
        x = x + 1
    }
}

My idea was to have each successful condition/load add 1 to x. So when X got to 3, it knew to stop.

Thanks in advance.


#2

Hi @kwgood1980! Something we find quite often on these forums (for example, in this thread) is people trying to approach a problem in R as they would in another language. I've certainly done it a lot!

My read on your post is that you have a bunch of CSV files, and you want to send each one to a database (sorry to sound like a call centre employee; I just wanna make sure I have it right).

If you'd prefer to stick with loops, you can actually iterate directly over the values of a vector in R, or you can generate a vector of indices in another vector with seq_along. This is a lot less prone to bugs. For example:

myfiles = c('apple.csv', 'banana.csv', 'pineapple.csv')

for (i in seq_along(myfiles)) {
  # note: check the case in stringsAsFactors!
  df <- fread(myfiles[i], stringsAsFactors = FALSE)
  sqlSave(con, df, tablename = paste0('dbo.file', i), colnames = FALSE, rownames = FALSE, append = TRUE)
}

Another solution using the purrr package would be to have a data frame with columns (vectors) for CSV file names and associated database table names, and to then apply a function over these two columns element-by-element. The function would load each CSV in and save to the database table:

library(tidyverse)
library(purrr)
myfiles = data_frame(
  csv = c('apple.csv', 'banana.csv', 'pineapple.csv'),
  table = c('dbo.apple', 'dbo.banana', 'dbo.pineapple'))

# map2 calls a function for each pair of elements fro mtwo vectors
map2(myfiles$csv, myfiles$table, function(csv_i, table_i) {
  # in this function, csv_i is an element of myfiles$csv and
  # table_i is an element of myfiles$table
  df <- fread(csv_i, stringsAsFactors = FALSE)
  sqlSave(con, df, tablename = table_i, colnames = FALSE, rownames = FALSE, append = TRUE)
})

If you're interested in how purrr functions like map2 work, check out this chapter of Advanced R! it's a very powerful way of working! I hope one of these approaches works for you :slight_smile:


#3

Hey thanks for the reply.

The gist is that 3 files are being created, A,B,C.

The problem is they get created at different times, and I need to automate it. So, if A drops, it gets scooped up and appended to the table in the database, but the loop keeps going because X != 3. Then file C gets finished, and the loop scoops it up, does the same, and now X = 2. Final file gets output an hour later, scooped up, appended to the table and now since X = 3, the loop exits and continues the rest of the code.

The big issue is I have no idea when the 3 files will be created so I made a second light weight program that can run in the background infinitely till the files are ready to be moved to the database.


#4

Ahh, I see! That's a bit more complicated.

Do the files necessarily become available in a sequential manner (file1.csv finishes, then file2.csv, then file3.csv...), or could they become available in any order? Are they being generated by some external process?

I feel like some sort of reactive event handler pattern might be more appropriate in this case, where you have a function that can react to a table being dropped. I'm not sure that R is necessarily the best tool for the job in that case; maybe some service that can watch for file updates can call your R code as a script using RScript.

Hopefully some other folks will drop into this thread (ha!) to think about other solutions.


#5

There's an AWS Lambda function that calls a bash script which fires off two RScripts. Those Rscripts will produce 3 files, however, the order is going to be random. The idea was to have the bash script call a third RScript(this one) that would continually loop in the background. 3 files get dropped, 3 files get inserted to the database, then it leaves the loop.


#6

I'm not sure how to solve, but you might also want a test to check if you've already sent the file to database, otherwise it looks like it will re-upload the available files continuously until done. Maybe something along these lines of pseudocode:

  1. Make table listing file names, availability, and upload completion.
  2. While upload completion < 3...
  3. Go through the list of files to find any that are BOTH un-uploaded and available. If any match, upload and mark uploaded.

#7

That's a good point. I think after the sqlsave I would just remove the file, increase the counter, and keep going through the loop.


#8

That'd probably work too. And if you're going through the trouble of removing the file from your list, you could just keep going until there's nothing left in the list.


#9

There are two problems with your while statement:

  1. It will break very quickly after any file exists. That is, once file2.csv (say) is written and the others aren't, it will be uploaded three times then the loop will break.
  2. It will only work when all of the files come in to existence in the order file3, file2, file1 and only at exactly the right time. Put another way, because of the elses once file1.csv is uploaded, it is impossible for the others to be uploaded afterwards.

You should test explicitly for the condition you want, using the return value of sqlSave as your trigger.

file1_absent <- file2_absent <- file3_absent <- TRUE
while (file1_absent || file2_absent || file3_absent) {
  if (file1_absent && file.exists(file1.csv)) {
    file1 <- fread("file1.csv", stringsasfactors = FALSE)
    file1_absent <- 
      !sqlSave(con, file2, tablename = "dm.file1", colnames = FALSE, rownames = FALSE, append = TRUE)
  }
  if (file2_absent &&file.exists(file2.csv)) {
    file2 <- fread("file2.csv", stringsasfactors = FALSE)
    file2_absent <- 
      !sqlSave(con, file2, tablename = "dm.file2", colnames = FALSE, rownames = FALSE, append = TRUE)
  }
  if (file3_absent && file.exists(file3.csv)) {
    file3 <- fread("file3.csv", stringsasfactors = FALSE)
    file3_absent <- 
      !sqlSave(con, file2, tablename = "dm.file3", colnames = FALSE, rownames = FALSE, append = TRUE)
  }
}