How calculate mean per row range

im trying to calculate mean per row range,
i want to generate monthly averages for each monitoring station from hourly data and eliminate the rows where the concentration is NA.

Here is an example of my dataframe

This sounds of a classic dplyr use, using group_by() and summarize(). Something along these lines:

library(dplyr)
my_df %>%
  group_by(month) %>%
  summarize(monthly_average = mean(concentration, na.rm = TRUE))

If needed, you can separate the column fetcha with the function separate() to create the month column.

To compute the average for every column (AJU, CAM, ... MPA), that becomes a bit harder, you need to use across(). See this example from ?across which looks a lot like what you need:

iris %>%
  group_by(Species) %>%
  summarise(across(starts_with("Sepal"), ~mean(.x, na.rm = TRUE)))

The data you have is what we call wide data. The names of the Observation stations are in the columns. It would be easier to calculate what you want if you have the names of the observations as another column. You can convert from wide to long using the tidyr::pivot_longer() function as shown below.

myData <- myData %>% 
          tidyr::pivot_longer(cols = -(c(fetcha, hora)), names_to = "Site", values_to = "Value")

and then create the Month column to aggregate using lubridate::month() and then group by Month, Site and summarise by mean(Value)

monthly_averages <- myData %>% mutate(Month = lubridate::month(fetcha)) %>% 
      group_by(Month, Site) %>%
      summarise(Monthly_Avg = mean(Value, na.rm = TRUE)) 
# the na.rm argument ignores the NA values while calculating the averages

You can then convert it back to wide format using the tidyr::pivot_wider() function.

monthly_averages  %>% tidyr::pivot_wider(names_from = Site, values_from = Monthly_Avg)
2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.