Need some help with function to read multiple csv files and calculate the mean() of columns ingoring NA

pollutantmean <- function(directory, pollutant, ind)
  {
  directory <- "specdata"
  z <- list.files("directory")
  print(z)
  ind
  i <- ind[1]
  totaal <- read.table(z[1])
  hulpspec <- data.frame()
  for (i in ind){
    i <- i+1
    hulpspec <- read.table(z[i])
    totaal <- rbind(totaal,hulpspec)
  }
  mean(totaal[ind], na.rm == FALSE)
}
**gave me this error message on windows10 home edition when executing "pollutantmean("specdata","sulfate", 1:10)"** 
" Error in file(file, "rt") : invalid 'description' argument 
4.
file(file, "rt") 
3.
read.table(file = file, header = header, sep = sep, quote = quote, 
    dec = dec, fill = fill, comment.char = comment.char, ...) 
2.
read.csv(z[1]) at pollutantmean.R#4
1.
pollutantmean("specdata", "sulfate", 1:10) "

Wath did i wrong? Help me thx a lot
Nobel

When you call

pollutantmean("specdata","sulfate", 1:10)

the variable directory within pollutantmean gets the value "specdata". So,

  1. There is no need to set the value of directory to be "specdata" within the function.
  2. When you call
z <- list.files("directory")

the term directory should not be in quotes.

Also, it is a bad idea to manually increment the i variable within the for loop. Let the for loop do the incrementing.

Finally, I do not think you want to write

mean(totaal[ind], na.rm == FALSE)

because ind is a vector. Don't you want to use the parameter pollutant there? And your title says you want to ignore NA, so use na.rm = TRUE.

I would write the function like this.

pollutantmean <- function(directory, pollutant, ind)
  {
  z <- list.files(directory)
  totaal <- data.frame()
  for (i in ind){
    hulpspec <- read.table(z[i])
    totaal <- rbind(totaal,hulpspec)
  }
  mean(totaal[, pollutant], na.rm = TRUE)
}

I have not tested that, since I do not have your data, so it may have mistakes.

Here is an alternative solution which excludes the need for the ind variable.

This assumes that "sulfate" is the name of a column:

library(tidyverse)

pollutantmean_1 <- function(directory, col) {
  list.files(directory) %>% 
    map_dfr(read.table, sep = "") %>% # change sep as required
    summarise(mean({{col}}, na.rm = TRUE)) %>% 
    pull()
}
pollutantmean_1("specdata", sulfate) # no quotes for the column name


pollutantmean_2 <- function(directory, col) {
  list.files(directory) %>% 
    map_dfr(read.table, sep = "") %>% # change sep as required
    summarise(mean(!!sym(col), na.rm = TRUE)) %>% 
    pull()
}
pollutantmean_2("specdata", "sulfate") # quotes for the column name
1 Like

thx for replying , i learned a lot of yours solution

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.