Hi RStudio Community,
I have a short question on grouping in dplyr.
Basically it is pretty simple to group a data frame by a column. Is it possible to make this group to the index of the resulting dataframe?
Best regards
Gauss
Hi RStudio Community,
I have a short question on grouping in dplyr.
Basically it is pretty simple to group a data frame by a column. Is it possible to make this group to the index of the resulting dataframe?
Best regards
Gauss
Is it possible to make this group to the index of the resulting dataframe?
@gauss193 Could you please elaborate on what you mean? Or explain what exactly it is you are trying to achieve? It wasn't clear to me.
I want to group a dataframe by the date column and get basic statistics over time for my data. The resulting dataframe should include the statistics in the columns and the grouped dates as rownames/index.
You can create a column for that. Let's say you want to do it monthly. Then you can do something like this:
library(lubridate)
df |>
mutate(months = month(date_variable)) |>
group_by(months) |>
summarise(nn = n()) # this gives you the count statistic, for example
Thank you very much for the quick response! One last question..
Lets say, I have a function for calculating the 5th quantile:
q05 <- function(var) {
return(quantile(var, 0.05, na.rm = T))
}
Now I want to call this function in grouping a given dataframe:
X = data.frame("myDate" = c("2015-01-01","2016-01-01","2017-01-01","2018-01-01","2019-01-01","2020-01-01"),
"var1" = c(1,2,3,4,5,6),
"var2" = c(2,2,2,4,4,4)
)
The following line does not work like I would expect:
myFunc<- function(df, myVar, myDateCol) {
result = df%>%
group_by("Date" = df[,myDateCol]) %>%
summarise("5th quantile" = q05(df[,myVar]))
return(result)
}
How can I group the dataframe and get my calculations over time?
I already worked with tapply here, but I could not manage to get it work for multiple functions e.g. q95
Do you want something like this?
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
DF <- data.frame(MyDate = as.Date(c("2021-01-01","2021-01-01",
"2021-02-01","2021-02-01",
"2021-03-01","2021-03-01")),
Value=c(400,335,562,521,456,344))
q05 <- function(var) {
return(quantile(var, 0.05, na.rm = T))
}
myFunc<- function(df, myVar, myDateCol) {
result = df %>%
group_by("Date" = {{myDateCol}}) %>%
summarise("5th quantile" = q05({{myVar}}))
return(result)
}
myFunc(DF, Value, MyDate)
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 3 x 2
#> Date `5th quantile`
#> <date> <dbl>
#> 1 2021-01-01 338.
#> 2 2021-02-01 523.
#> 3 2021-03-01 350.
Created on 2021-08-20 by the reprex package (v0.3.0)
Thanl you so much, thats exactly what I have been looking for!
Is it also possible to pass the column names as strings?
Here is an example that has one of the arguments passed as a bare name and the other passed as a character string. You can learn about this at
library(dplyr)
DF <- data.frame(MyDate= as.Date(c("2021-01-01","2021-01-01",
"2021-02-01","2021-02-01",
"2021-03-01","2021-03-01")),
Value=c(400,335,562,521,456,344))
q05 <- function(var) {
return(quantile(var, 0.05, na.rm = T))
}
myFunc<- function(df, myVar, myDateCol) {
result = df %>%
group_by("Date" = .data[[myDateCol]]) %>%
summarise("5th quantile" = q05({{myVar}}))
return(result)
}
myFunc(DF, Value, "MyDate")
Thank you! Thats great
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.