Reset counter on new date & update counter on current date

ghjk · June 4, 2022, 4:24pm

I am currently trying to create and write to a file from Rstudio that would do the following things:

Each time I run a R script within a particular date, the file would contain an updated counter number (e.g. if I run the R script total 10 times at various time periods within a day, that counter number should be 10).
The counter number would automatically reset to 0 on a new date when I open the same Rscript, and increment each time I run that Rscript.

Can anyone please help me with a way to achieve this task?

FJCC · June 4, 2022, 5:46pm

Here is a simpleminded way to do that. The CounterTrack file stores the date and the count and FileWithCount would have the current count and whatever else you want.

if(!file.exists("CounterTrack.txt")) {
  writeLines(c(as.character(Sys.Date()), 0), con = "CounterTrack.txt")
}

Info <- readLines("CounterTrack.txt")
if(as.Date(Info[1]) == Sys.Date()){
  Info[2] <- as.numeric(Info[2]) + 1
} else {
  Info[1] <- as.character(Sys.Date())
  Info[2] <- 1
}

writeLines(Info, "CounterTrack.txt")

writeLines(Info[2], con = "FileWithCount.txt")

ghjk · June 4, 2022, 8:02pm

Many thanks for your help. I think the code you showed were helpful, but it lacks of two major sub-tasks:

It looks like to guarantee that the counter is getting updated at every runs during different time periods, we need to make sure the user runs the R-script from end-to-end, even if we put the code block above at the beginning of our R script?
If I want the CounterTrack.txt to be deleted at the end of each day (based on computer's clocktime) to save the memory usage, is there any ways to accomplish that? I think for your script, this might be unnecessary since the CounterTrack.txt only has 2 lines, regardless of how many days the scrips are run, and how many times within each day the script is run?

Many thanks for your great help though. I was thinking of adding the counter to the tempfile() but then it does not work since tempfile() got deleted when user closed the R session, which is a bummer. I want to change the directory file of tempfile() but when I changed it and tried to look up the file, I couldn't find it in my computer.

FJCC · June 4, 2022, 8:37pm

I don't know of any way to keep the count correct if people can select parts of the code to run.

I'm sure it is possible on any operating system to schedule a task to delete a certain file at a given time. On a Linux system, that would be scheduled with a cron job. I'm sure Windows and Mac have similar tools.

ghjk · June 4, 2022, 8:59pm

Do you know if we can do it within R though? I am thinking of deleting the CounterTrack.txt file created in the previous day everytime when the script is run on a new day, which means the name is probably needed to change to paste0("CounterTrack", "_", Sys.Date(), ".txt") and add another line within the if(!file.exists(paste0("CounterTrack", "_", Sys.Date(), ".txt"))) as unlink(files(....., pattern = [all .txt file's names starting with CounterTrack and contains date that is less than Sys.Date() ])).

Do you think this would work?

FJCC · June 4, 2022, 9:31pm

Yes, you can have the file name include the date and you can use the file.remove() or unlink() functions to delete files you do not want. I suggest you make the file names very clearly distinct from any other possible file if you will be searching and deleting automatically.

ghjk · June 5, 2022, 5:16am

Thank you very much for your quick feedback. It took me a while to think through how to modify your above code to deal with my large-scale situation so that all the existing file names can contain the number of runs while avoiding to incur additional storage cost on the cloud. The situation I am thinking of requires changing the file name's of several output files when the same R script is run 200+ times.

For simplicity, suppose the number of output files is 3 (in reality, it can be 100). Also assume the first 10 runs belongs to the first 1st output file (i.e., this file is overwritten 10 times), the next 50 runs would give 2nd output file (i.e., this file is overwritten 50 times), and the next 40 runs would give 3rd output file (i.e., this file is overwritten 40 times). If the current naming convention of these 3 output files, as a result of running the same R script 100 times, is [Job ID]_DATE1_DATE2_[Sys.Date()], how could we modify the above code to have [Info[2]JOB_ID]_DATE1_DATE2_[Sys.Date()]so that we only have 1 (potentially big)CounterTrack_[JOB ID].txt files per each day, regardless of the number of Job IDs we have? I meant, how do we employ only 1 .txt file but being able to update the counter Info[2] that is placed into naming conventions of the 3 output files?

ghjk · June 6, 2022, 12:26am

Nobody wants to help me further with this problem?

FJCC · June 6, 2022, 12:33am

I am a bit confused about what all your requirements are. Here is my latest attempt to do what I think are the basics. There will be one CounterTrack.txt file that will hold the counts for all of the scripts. The first line of this file will be the current date. Each line below that will be a Job ID followed by a colon and the corresponding count. I intend for the Job ID to identify which script is running. When a script is run for the first time on a given day, the file will not have a line yet. The Job ID and the count of 1 are filled in at that time. On subsequent runs, the count is incremented.
I did very little testing of the code so expect to find some errors.

library(readr)
library(stringr)
if(file.exists("CounterTrack.txt")) {
  Tracker <- read_lines("CounterTrack.txt",lazy = FALSE)
  if(Tracker[1] != as.character(Sys.Date())) {
    Tracker <- as.character(Sys.Date())
  }
} else {
  Tracker <- as.character(Sys.Date)
}

ProcessID <- "Job1"

Location <- str_which(Tracker, ProcessID)
if(length(Location) == 0) {
  Tracker <- c(Tracker, paste(ProcessID, 1, sep = ":"))
} else {
  tmp <- Tracker[Location]
  Val <- str_extract(tmp, "\\d+$")
  Val <- as.numeric(Val) + 1
  Tracker[Location] <- paste(ProcessID, Val, sep = ":")
}
write_lines(Tracker, file = "CounterTrack.txt")

ghjk · June 6, 2022, 6:39pm

Many thanks for your patience and the very good code to build upon. I think I should have been more clear: I run only 1 R script. The script takes multiple Job IDs, and for each job ID, the script releases the corresponding output file with file name JobID_Date1_Date2_CurrentRunDate.tgz. Since I won't have up-front a huge file like CounterTrack.txt file that contains all the Job IDs (since new Job IDs might come up on different days). Each time I run the script, a bunch of output files would come out with distinct Job IDs. How do I leverage the Job IDs and CurrentRunDate from the existing output files' names to build the run numbers and add them in front of Job IDs automatically per each day?
Example. I currently have two output files named ABC_05062022_06302022_06062022.tgz and ADE_05062022_07302022_06062022.tgz. Assume I run the script with inputs having job IDs ABC and ADE 5 and 10 times respectively, I want to update the files' names as 5ABC_05062022_06302022_06062022.tgz and 10ADE_05062022_06302022_06062022.tgz. Now, if I run them 3 and 4 times tomorrow, I would like to have 3ABC_05062022_06302022_06072022.tgz and 4ABC_05062022_06302022_06072022.tgz

system · June 13, 2022, 6:40pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.