Reading and updating lines

Suppose I have the following files:

library(tidyverse)

# Toy data

## toy file 1
write_lines(c("510020221015123456.00000", 
  "510020221016456456.00000", 
  "510020221017678456.00000"), "abc_w20220111.txt")

## toy file 2
write_lines(c("510020221115123456.00000", 
             "510020221116456456.00000", 
             "510020221117678456.00000"), "abc_w20220112.txt")

For each file, there is a corresponding date. For example,

read_lines("abc_w20220111.txt") |> 
  as_tibble() |> 
  mutate(date = str_sub(value, 5, 12))
value date
510020221015123456.00000 20221015
510020221016456456.00000 20221016
510020221017678456.00000 20221017

I want to drop the lines if they belong to a vector. For example,
dates_not_wanted <- c(20221015, 20221117).
And then create text files again, maintaining the same name, however adding the text_updated. The below code does what I want, but not efficient. How can I achieve the result more programmatically? Imagine I have 1,000 files to update.

dates_not_wanted <- c(20221015, 20221117)

read_lines("abc_w20220111.txt") |> 
  as_tibble() |> 
  mutate(date = str_sub(value, 5, 12)) |> 
  filter(!date %in% dates_not_wanted) |> 
  select(-date) |> 
  pull() |> 
  write_lines("abc_w20220111_updated.txt")

read_lines("abc_w20220112.txt") |> 
  as_tibble() |> 
  mutate(date = str_sub(value, 5, 12)) |> 
  filter(!date %in% dates_not_wanted) |> 
  select(-date) |> 
  pull() |> 
  write_lines("abc_w20220112_updated.txt")

You could try this approach. First, create a vector of the files named myfiles (by specifying the folder path housing all the files), list the dates not wanted, and then execute the read_and_write function for all files using walk().

# read all files
myfiles = list.files(path = 'yourfolderpath', pattern = 'abc_w', full.names = T)

# dates not wanted
dates_not_wanted <- c(20221015, 20221117)

# function to read and write new files
read_and_write = function(i) {
  new_name = str_replace(i, '.txt', '_updated.txt')
  
  read_lines(i) |> 
    as_tibble() |> 
    mutate(date = str_sub(value, 5, 12)) |> 
    filter(!date %in% dates_not_wanted) |> 
    select(-date) |> 
    pull() |> 
    write_lines(new_name)
}

# execute for all files
walk(myfiles, read_and_write)
1 Like

Maybe something like this?

list.files(pattern = "\\.txt$") %>% 
    walk(~{
        read_lines(.x) |> 
            as_tibble() |> 
            mutate(date = str_sub(value, 5, 12)) |> 
            filter(!date %in% dates_not_wanted) |> 
            select(-date) |> 
            pull() |> 
            write_lines(paste0(str_remove(.x, "\\.txt$"), "_updated.txt"))
    })
1 Like

@scottyd22 @andresrcs Thank you, both, for your excellent solutions. I not only find them very helpful but also inspiring!

1 Like

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.