Grouping with dplyr

Hi RStudio Community,

I have a short question on grouping in dplyr.
Basically it is pretty simple to group a data frame by a column. Is it possible to make this group to the index of the resulting dataframe?

Best regards
Gauss

Is it possible to make this group to the index of the resulting dataframe?

@gauss193 Could you please elaborate on what you mean? Or explain what exactly it is you are trying to achieve? It wasn't clear to me.

I want to group a dataframe by the date column and get basic statistics over time for my data. The resulting dataframe should include the statistics in the columns and the grouped dates as rownames/index.

You can create a column for that. Let's say you want to do it monthly. Then you can do something like this:

library(lubridate)
df |>
  mutate(months = month(date_variable)) |>
  group_by(months) |>
  summarise(nn = n()) # this gives you the count statistic, for example

Thank you very much for the quick response! One last question..

Lets say, I have a function for calculating the 5th quantile:

q05 <- function(var) {
  return(quantile(var, 0.05, na.rm = T))
}

Now I want to call this function in grouping a given dataframe:

X = data.frame("myDate" = c("2015-01-01","2016-01-01","2017-01-01","2018-01-01","2019-01-01","2020-01-01"),
                "var1" = c(1,2,3,4,5,6),
                "var2" = c(2,2,2,4,4,4)
)

The following line does not work like I would expect:

myFunc<- function(df, myVar, myDateCol) {

    result = df%>%
    group_by("Date" = df[,myDateCol]) %>%
      summarise("5th quantile" = q05(df[,myVar]))

    return(result)
}

How can I group the dataframe and get my calculations over time?
I already worked with tapply here, but I could not manage to get it work for multiple functions e.g. q95

Do you want something like this?

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

DF <- data.frame(MyDate = as.Date(c("2021-01-01","2021-01-01",
                                   "2021-02-01","2021-02-01",
                                   "2021-03-01","2021-03-01")),
                 Value=c(400,335,562,521,456,344))

q05 <- function(var) {
  return(quantile(var, 0.05, na.rm = T))
}

myFunc<- function(df, myVar, myDateCol) {
  
  result = df %>%
    group_by("Date" = {{myDateCol}}) %>%
    summarise("5th quantile" = q05({{myVar}}))
  
  return(result)
}

myFunc(DF, Value, MyDate)
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 3 x 2
#>   Date       `5th quantile`
#>   <date>              <dbl>
#> 1 2021-01-01           338.
#> 2 2021-02-01           523.
#> 3 2021-03-01           350.

Created on 2021-08-20 by the reprex package (v0.3.0)

Thanl you so much, thats exactly what I have been looking for!
Is it also possible to pass the column names as strings?

Here is an example that has one of the arguments passed as a bare name and the other passed as a character string. You can learn about this at

library(dplyr)

DF <- data.frame(MyDate= as.Date(c("2021-01-01","2021-01-01",
                                   "2021-02-01","2021-02-01",
                                   "2021-03-01","2021-03-01")),
                 Value=c(400,335,562,521,456,344))

q05 <- function(var) {
  return(quantile(var, 0.05, na.rm = T))
}

myFunc<- function(df, myVar, myDateCol) {
  result = df %>%
    group_by("Date" = .data[[myDateCol]]) %>%
    summarise("5th quantile" = q05({{myVar}}))
  
  return(result)
}

myFunc(DF, Value, "MyDate")
1 Like

Thank you! Thats great :slight_smile:

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.