Morning All,
i'm using and learning dbplyr with a Microsoft SQL Server, but i'm in trouble with a simple task.
I want to calculate some descriptive statistic on a variable on all table, so i have not a grouping variable, but and i receive an error. I don't understand very well how to handle dbplyr's group_by clauses that translate SQL's over() clauses.
My lazy query
tbl_query <- tbl(con, "table")
dbplyr query
tbl_query %>%
select(numeric_variable) %>%
summarise(min = min(numeric_variable),
mean = mean(numeric_variable),
median = quantile(numeric_variable, 5),
max = max(numeric_variable),
sum = sum(numeric_variable),
)
the error
Error: <SQL> 'SELECT TOP(11) MIN("numeric_variable") AS "min", AVG("numeric_variable") AS "mean", PERCENTILE_CONT(5.0) WITHIN GROUP (ORDER BY "numeric_variable") AS "median", MAX("numeric_variable") AS "max", SUM("numeric_variable") AS "sum"
FROM (SELECT TOP 100 PERCENT "idconsole", "idconsole" AS "numeric_variable"
FROM "lead") "dbplyr_027"'
nanodbc/nanodbc.cpp:1587: 42000: [Microsoft][ODBC SQL Server Driver][SQL Server]The function 'PERCENTILE_CONT' must have an OVER clause.
the only way is to think in SQL way and calculate all aggregate function as window function and at the end keep only a row? like this
tbl_query %>%
select(numeric_variable) %>%
mutate(min = min(numeric_variable, na.rm = TRUE),
median = quantile(numeric_variable, 0.5),
mean = mean(numeric_variable, na.rm = TRUE),
max = max(numeric_variable, na.rm = TRUE),
sum = sum(numeric_variable, na.rm = TRUE)) %>%
ungroup() %>%
head(n = 1)
there's a way to give in dbplyr different group_by condition for the different aggregation functions?
thanks in advance.
MC