I scheduled the Rscript which takes data from MS SQL Server and performs regression. The script takes only new data.
For example, today was loaded 100 obs from 01.01.2017-03.03.2017,script conducted the regression on this data.
Tomorrow will be loaded 100 obs from 04.03.2017-04.06.2017 and script will work with this obs and not from 01.01.2017 -04.06.2017.
I asked at this forum the question,how to make that R only works with data that have a fresh date and got this useful answer,
where we create last date log and take the data older than it.
If anyone is interested, here the link
# READ DATE FROM LOG FILE log_dt <- readLines("/path/to/SQL_MaxDate.txt", warn=FALSE) # QUERY WITH WHERE CLAUSE sql <- paste0("SELECT Dt, CustomerName, ItemRelation, SaleCount, DocumentNum, DocumentYear, IsPromo FROM dbo.mytable WHERE Dt > '", log_dt, "'") df <- sqlQuery(dbHandle, sql) # RETRIEVE MAX DATE VALUE max_DT <- as.character(max(df$Dt)) # ...here code for regression, now, it's not important for this question # WRITE DATE TO LOG FILE cat(max_DT, file="/path/to/SQL_MaxDate.txt")
The question:My Scheduler runs 1 time per day, but it happens that the data in the SQL database does not load every day, it can loaded for example 1 time of 3 days and so on.
Can R make a check?
If R determines that there is no new data( there is no fresh date), then the script does not start?
If R determines("see") that there is new data, it runs script.
I.E. for example, last date when Rscript ran ,was 12.05.2018, and 13.05.2018 Rscipt was run by schedule, but on this date nothing was loaded in sql, R "see" that there is no new date, and it will work with same last date and it doesn't run. And R must do this checking everytime when running, is there new date or not.
Is it possible to do or no?