How to automate multiple csv files and how to work on them

Sheba · April 18, 2022, 3:52pm

Hi,

Please I need help to automate this process. I have multiple csv files for each pollutant. So for a folder called "C02", it has about 50 csv files ( each csv file corresponds to a year), the other pollutant have similar number of csv files. The attached is the code I have been using. I want to automate this ( for the csv files for each pollutant) since that is going to save time and resources.

#===== 1980 ======
hap1980 <- read.csv("daily_HAPS_1980.csv")

t80 <- hap1980[hap1980$State.Name %in% c("Michigan", "Ohio","Indiana","Pennsylvania") , c('State.Name','County.Code','County.Name', 'Longitude', 'Latitude','Arithmetic.Mean')]

cord <- c("Latitude", "Longitude")
t80$GrpHAP <- do.call(paste,c(t80[cord], sep=" , "))

tm80 <- t80 %>%
group_by(State.Name,County.Name,Latitude,Longitude, GrpHAP) %>%
summarise(Trimmed_mean= mean(Arithmetic.Mean, trim=0.1), Average = mean(Arithmetic.Mean))

write.csv(tm80,"path\Timmedmeans_1980.csv", row.names = FALSE)

#=========================== 1981 =================================

hap1981 <- read.csv("daily_HAPS_1981.csv")

t81 <- hap1981[hap1981$State.Name %in% c("Michigan", "Ohio","Indiana","Pennsylvania") , c('State.Name','County.Code','County.Name', 'Longitude', 'Latitude','Arithmetic.Mean')]

cord <- c("Latitude", "Longitude")
t81$GrpHAP <- do.call(paste,c(t81[cord], sep=" , "))

tm81 <- t81 %>%
group_by(State.Name,County.Name,Latitude,Longitude, GrpHAP) %>%
summarise(Trimmed_mean= mean(Arithmetic.Mean, trim=0.1), Average = mean(Arithmetic.Mean))

write.csv(tm81,"path\Timmedmeans_1981.csv", row.names = FALSE)

#=========================== 1982 =================================

hap1982 <- read.csv("daily_HAPS_1982.csv")

t82 <- hap1982[hap1982$State.Name %in% c("Michigan", "Ohio","Indiana","Pennsylvania") , c('State.Name','County.Code','County.Name', 'Longitude', 'Latitude','Arithmetic.Mean')]

cord <- c("Latitude", "Longitude")
t82$GrpHAP <- do.call(paste,c(t82[cord], sep=" , "))

tm82 <- t82 %>%
group_by(State.Name,County.Name,Latitude,Longitude, GrpHAP) %>%
summarise(Trimmed_mean= mean(Arithmetic.Mean, trim=0.1), Average = mean(Arithmetic.Mean))

write.csv(tm82,"path\Timmedmeans_1982.csv", row.names = FALSE)

#=========================== 1983 =================================

hap1983 <- read.csv("daily_HAPS_1983.csv")

t83 <- hap1983[hap1983$State.Name %in% c("Michigan", "Ohio","Indiana","Pennsylvania") , c('State.Name','County.Code','County.Name', 'Longitude', 'Latitude','Arithmetic.Mean')]

cord <- c("Latitude", "Longitude")
t83$GrpHAP <- do.call(paste,c(t83[cord], sep=" , "))

tm83 <- t83 %>%
group_by(State.Name,County.Name,Latitude,Longitude, GrpHAP) %>%
summarise(Trimmed_mean= mean(Arithmetic.Mean, trim=0.1), Average = mean(Arithmetic.Mean))

write.csv(tm83,"path\Timmedmeans_1983.csv", row.names = FALSE)

AlexisW · April 19, 2022, 1:17am

There are many ways to do it, I think the simplest is with an explicit for loop.

First, make a list of all the files that need processing:

all_files <- list.files("path",  pattern = "csv$")

for(current_file in all_files){
   hap <- read.csv(current_file)

 [...]

  write.csv(tm, paste0("path/Timmedmeans_", current_file, ".csv"), row.names = FALSE)
}

And you fill in the code for a single file within the for loop.

You can find a lot more explanations in r4ds.

Sheba · April 19, 2022, 8:13pm

Thank you AlexisW. I would give it a try and let you know how it goes

Sheba · April 20, 2022, 3:16am

Hi ,

I tried the code but i keep getting errors. Please be patient if the error seems obvious, I'm very new to R and to for loops in general.

all_files <- list.files("path", pattern = "csv$")

results <- vector("list", length(current_file))
for(current_file in all_files){
hap <- read.csv(current_file)
count <- nrow(hap)
t82 <- hap[hap$State.Name %in% c("Michigan", "Ohio","Indiana","Pennsylvania") , c('State.Name','County.Code','County.Name', 'Longitude', 'Latitude','Arithmetic.Mean')]

cord <- c("latitude", "longitude")
t82$grphap <- do.call(paste,c(t82[cord], sep=" , "))

tm82 <- t82 %>%
group_by(state.name,county.name,latitude,longitude, grphap) %>%
summarise(trimmed_mean= mean(arithmetic.mean, trim=0.1), average = mean(arithmetic.mean))

write.csv(tm, paste0("path/timmedmeans_", current_file, ".csv"), row.names = FALSE)
results[current_file] <- count
}

williaml · April 20, 2022, 3:24am

Have a look at this perhaps?

Sheba · April 20, 2022, 3:26am

Yes williaml I have . I tried it but it didn't get the right output

williaml · April 20, 2022, 3:26am

We don't have any of your files anyway, so can't reproduce your issue. It might help if you posted the error that you are having.

Sheba · April 20, 2022, 12:43pm

Thank you guys. It is working now. Yyaayyyyyyy . Thank you !!!!!!

neuwirthe · April 20, 2022, 4:22pm

Here is what I do in a similar situation

file_template <- 
  "https://info.gesundheitsministerium.gv.at/data/archiv/COVID19_vaccination_doses_agegroups_xxxxxxxx.csv"

seq(from=as.Date("2021-10-29"),to=Sys.Date(),by="1 day") |>
  as.character() |>
  str_remove_all("-") ->
  date_seq

str_replace(file_template,"xxxxxxxx",date_seq) ->
  files_to_get

map(files_to_get,
    \(f)read_csv2(f)
    ) |>
  reduce(bind_rows) ->
  vac_age_doses_data

Sivuyile · April 22, 2022, 5:19am

Another approach is to store the list of csvs in a dataframe see example below.

library(tidyverse)
data.frame(co2_path = list.files(full.names=TRUE, pattern= ".csv")) %>%
mutate(co2_files = map(co2_path, read_csv))%>%
unnest(coa_files)%>%
filter('State Name' %in% c ("Michigan","Ohio", "Indiana", "Pennsylvania"))

This way, you will have one object to conduct analysis on. It will help you if you need to check your code later for one reason or the other.

Javielioi · May 5, 2022, 8:42pm

Hi, for big files I use data.table (fread) and furrr (future_map_dfr) packages. Something like this:

library(data.table)
library(furrr)

csv_files_path <- list.files(path = "your_folder_path", pattern = "*.csv", full.names = TRUE)

plan(multisession)
df <- future_map_dfr(csv_files_path, ~fread(.x))

Does it help?

Javi

system · May 26, 2022, 8:42pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.