How to use function to manipulate the dataframe in R

saurabh255 · November 14, 2020, 1:51pm

Hi I have the following code with me which I am writing in R to create a dataframe

library(lubridate)
library(tibble)
library(aimsir17)
library(dplyr)
library(ggplot2)



my_obs <- subset(observations,
                 station %in% c("DUBLIN AIRPORT","MACE HEAD","SherkinIsland","BELMULLET"))

now after creating my_obs dataframe i want to manipulate it using functions to access only specific by passing 4 parameter like following-

get_daily_summary <- function(dayNo, stat, attribute, f){
  
}

The function can access my_obs from the global workspace

• It takes in 4 params

– dayNo (1-365)

– stat ("DUBLIN AIRPORT","MACE HEAD","SherkinIsland","BELMULLET")

– attribute (”rain”, “temp”, “msl”, “wdsp”)

– f (a function) – (min, max, mean, sum)

• This function “collapses” 24 daily observations for a given station on a
particular day number to one summary value calculated by f.

but I am not sure how what to write under those curly brackets to access specific fields of data-frame as I am very new to R. Can somebody please tell me how to write functions to access specific fields mentioned above.

AlexisW · November 15, 2020, 9:18pm

First, let's generate some example data:

library(dplyr)

set.seed(1)

my_obs <- expand.grid(time = 1:24,
                      day = 1:365,
                      airport = LETTERS[1:4]) %>%
  bind_cols(rain = round(rnorm(24*365*4, 70, 20), 2),
            temp = round(runif(24*365*4, -5, 35), 2))

head(my_obs)
#   time day airport   rain  temp
# 1    1   1       A  57.47 20.45
# 2    2   1       A  73.67 32.96
# 3    3   1       A  53.29 -3.73
# 4    4   1       A 101.91 28.28
# 5    5   1       A  76.59 -1.39
# 6    6   1       A  53.59 21.84

Now, if you're using dplyr, it is quite easy to obtain the kind of summary that you want using filter(), group_by(), and summarize(). For example:

my_obs %>%
  filter(day == 5) %>%
  group_by(airport) %>%
  summarize(mean_rain = mean(rain),
            min_temp = min(temp))
# A tibble: 4 x 3
#   airport mean_rain min_temp
#   <fct>       <dbl>    <dbl>
# 1 A            68.9    -4.71
# 2 B            65.7    -4.65
# 3 C            81.7    -4.6 
# 4 D            73.3    -3.8

You can easily put that in a function selecting the day and station and computing a predefined statistic:

my_func <- function(d, s){
  my_obs %>%
    filter(day == d, airport == s) %>%
    summarize(summ = mean(rain))
}

The difficulty is to use dplyr functions with parameters, for that you should read up this guide. In your case, it's quite simple, you just need to "embrace" the attribute variable, but you can pass the function as is:

my_func <- function(d, s, a, f){
  my_obs %>%
    filter(day == d, airport == s) %>%
    summarize(summ = f({{a}}))
}

my_func(5, "A", rain, mean)
#    summ
# 1 68.94

system · December 6, 2020, 9:18pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.