I am trying to run some code that for sure worked last April-ish (2021). It does a summarize() without a group_by() using dbplyr. But it now gives an error. When I look at the SQL generated, it seems it automatically is looking for a grouping variable, but I don't want one - I just want an overall summary. Is there a way around this? Or should I just use SQL for this? Just trying to teach this to students, so it's not that big of a deal. Reprex is below.
library(tidyverse) # for reading in data, graphing, and cleaning library(dbplyr) # for SQL query "cheating" - part of tidyverse but needs to be loaded separately #> #> Attaching package: 'dbplyr' #> The following objects are masked from 'package:dplyr': #> #> ident, sql library(mdsr) # for accessing some databases - goes with Modern Data Science with R textbook library(RMySQL) # for accessing MySQL databases #> Loading required package: DBI library(RSQLite) # for accessing SQLite databases #> #> Attaching package: 'RSQLite' #> The following object is masked from 'package:RMySQL': #> #> isIdCurrent con_air <- dbConnect_scidb("airlines") overall_smry <- tbl(con_air, "flights") %>% summarize(min_year = min(year), max_year = max(year), num_flights = n(), num_carriers = n_distinct(carrier)) overall_smry #> Warning: Missing values are always removed in SQL. #> Use `MIN(x, na.rm = TRUE)` to silence this warning #> This warning is displayed only once per session. #> Warning: Missing values are always removed in SQL. #> Use `MAX(x, na.rm = TRUE)` to silence this warning #> This warning is displayed only once per session. #> Error in .local(conn, statement, ...): could not run statement: Unknown column '' in 'field list' overall_smry %>% show_query() #> <SQL> #> SELECT ``, MIN(`year`) AS `min_year`, MAX(`year`) AS `max_year`, COUNT(*) AS `num_flights`, COUNT(DISTINCT `carrier`) AS `num_carriers` #> FROM `flights` #> GROUP BY ``
Created on 2021-09-27 by the reprex package(v2.0.0)