How to select the file based on last modified time ?

AbhishekHP · July 18, 2019, 3:18pm

Basically, we got to select the data from the most recent file.
However file name has few discrepancy, and hence using string and rebus to clean them.

But can we use this info to select the most recent file name ?

Please find the simplified reprex :

library(tidyverse)
library(rebus)

myfiles <- tribble(
  ~files,~last_modified,
  "file_2014_01.csv", "2019-07-17T14:00:20.000Z",
  "file_2014_01 ", "2019-07-17T14:00:21.000Z",
  "file_2014_01.csv", "2019-07-17T13:59:36.000Z",
  "file_2014_01fdn.csv", "2019-07-17T14:00:23.000Z",
  "file_2014_01.csv", "2019-07-17T14:00:11.000Z",
  "file_2014_01.csv", "2019-07-17T14:00:27.000Z", # Most recent
  "äsdfile_2014_03.csv", "2019-06-17T14:00:23.000Z",
  "qwerfile_2014_03 ", "2019-07-15T14:00:21.000Z",
  "file_2014_03.csv", "2019-01-17T13:59:36.000Z",
  "bfffile_2014_03fdn.csv", "2019-06-17T14:00:32.000Z",
  "cvfile_2014_03.csv", "2019-07-14T14:00:11.000Z",
  "uufile_2014_03.csv", "2019-2-17T15:00:23.000Z" # Most recent
)

# Select same months
to_group <- myfiles %>% select(files) %>% unlist() %>%
  str_extract(pattern = one_or_more(DGT) %R% ANY_CHAR %R%
                one_or_more(DGT))

# number of months to choose from
to_group %>% unique()

# How can we use this info to select the file from the myfiles ?

andresrcs · July 18, 2019, 9:50pm

This is a solution with regular expressions instead of rebus

Note: This is not the most recent datetime for that group because it's in february

library(tidyverse)
library(lubridate)

myfiles <- tribble(
  ~files,~last_modified,
  "file_2014_01.csv", "2019-07-17T14:00:20.000Z",
  "file_2014_01 ", "2019-07-17T14:00:21.000Z",
  "file_2014_01.csv", "2019-07-17T13:59:36.000Z",
  "file_2014_01fdn.csv", "2019-07-17T14:00:23.000Z",
  "file_2014_01.csv", "2019-07-17T14:00:11.000Z",
  "file_2014_01.csv", "2019-07-17T14:00:27.000Z", # Most recent
  "äsdfile_2014_03.csv", "2019-06-17T14:00:23.000Z",
  "qwerfile_2014_03 ", "2019-07-15T14:00:21.000Z",
  "file_2014_03.csv", "2019-01-17T13:59:36.000Z",
  "bfffile_2014_03fdn.csv", "2019-06-17T14:00:32.000Z",
  "cvfile_2014_03.csv", "2019-07-14T14:00:11.000Z",
  "uufile_2014_03.csv", "2019-2-17T15:00:23.000Z" # Most recent
)

myfiles %>% 
  mutate(group = str_extract(files, "\\d{4}.\\d{2}"),
         last_modified = ymd_hms(last_modified)) %>% 
  group_by(group) %>% 
  filter(last_modified == max(last_modified))
#> # A tibble: 2 x 3
#> # Groups:   group [2]
#>   files               last_modified       group  
#>   <chr>               <dttm>              <chr>  
#> 1 file_2014_01.csv    2019-07-17 14:00:27 2014_01
#> 2 "qwerfile_2014_03 " 2019-07-15 14:00:21 2014_03

^{Created on 2019-07-18 by the reprex package (v0.3.0)}

system · July 25, 2019, 9:50pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.