Reading and updating lines

budugulo · December 21, 2022, 5:09pm

Suppose I have the following files:

library(tidyverse)

# Toy data

## toy file 1
write_lines(c("510020221015123456.00000", 
  "510020221016456456.00000", 
  "510020221017678456.00000"), "abc_w20220111.txt")

## toy file 2
write_lines(c("510020221115123456.00000", 
             "510020221116456456.00000", 
             "510020221117678456.00000"), "abc_w20220112.txt")

For each file, there is a corresponding date. For example,

read_lines("abc_w20220111.txt") |> 
  as_tibble() |> 
  mutate(date = str_sub(value, 5, 12))

value	date
510020221015123456.00000	20221015
510020221016456456.00000	20221016
510020221017678456.00000	20221017

I want to drop the lines if they belong to a vector. For example,
dates_not_wanted <- c(20221015, 20221117).
And then create text files again, maintaining the same name, however adding the text_updated. The below code does what I want, but not efficient. How can I achieve the result more programmatically? Imagine I have 1,000 files to update.

dates_not_wanted <- c(20221015, 20221117)

read_lines("abc_w20220111.txt") |> 
  as_tibble() |> 
  mutate(date = str_sub(value, 5, 12)) |> 
  filter(!date %in% dates_not_wanted) |> 
  select(-date) |> 
  pull() |> 
  write_lines("abc_w20220111_updated.txt")

read_lines("abc_w20220112.txt") |> 
  as_tibble() |> 
  mutate(date = str_sub(value, 5, 12)) |> 
  filter(!date %in% dates_not_wanted) |> 
  select(-date) |> 
  pull() |> 
  write_lines("abc_w20220112_updated.txt")

scottyd22 · December 21, 2022, 9:04pm

You could try this approach. First, create a vector of the files named myfiles (by specifying the folder path housing all the files), list the dates not wanted, and then execute the read_and_write function for all files using walk().

# read all files
myfiles = list.files(path = 'yourfolderpath', pattern = 'abc_w', full.names = T)

# dates not wanted
dates_not_wanted <- c(20221015, 20221117)

# function to read and write new files
read_and_write = function(i) {
  new_name = str_replace(i, '.txt', '_updated.txt')
  
  read_lines(i) |> 
    as_tibble() |> 
    mutate(date = str_sub(value, 5, 12)) |> 
    filter(!date %in% dates_not_wanted) |> 
    select(-date) |> 
    pull() |> 
    write_lines(new_name)
}

# execute for all files
walk(myfiles, read_and_write)

andresrcs · December 21, 2022, 9:06pm

Maybe something like this?

list.files(pattern = "\\.txt$") %>% 
    walk(~{
        read_lines(.x) |> 
            as_tibble() |> 
            mutate(date = str_sub(value, 5, 12)) |> 
            filter(!date %in% dates_not_wanted) |> 
            select(-date) |> 
            pull() |> 
            write_lines(paste0(str_remove(.x, "\\.txt$"), "_updated.txt"))
    })

budugulo · December 22, 2022, 8:48pm

@scottyd22 @andresrcs Thank you, both, for your excellent solutions. I not only find them very helpful but also inspiring!

system · February 2, 2023, 8:48pm

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.