why no dbplyr support for MySQL/MariaDB windowing functions?

I use MariaDB 10.3.11 and since 10.2 it's supported windowing functions. MySQL has also supported them since 8.0.2 I believe. The additions to MySQL have been more recent, but it's still been a feature for about 18 months.

I came across this issue: https://github.com/r-dbi/RMariaDB/issues/89, which is why I'm posting here now. I couldn't find a previous discussion about this on this website.

MySQL/MariaDB's support for window functions is inaccurately mentioned in the documentation.

Sorry, I don't have a reprex, but this isn't a bug, it's a feature being intentionally turned off, and I'm not sure the simulate_mysql() would simulate a current MySQL or MariaDB if it's assumed those don't have features they have. Sending a windowing query through the MariaDB DBI works:

suppressPackageStartupMessages({
    # (it's not a reprex but I don't want to paste these package messages,
    # rather just the output I get)
    library(tidyverse)
    library(DBI)
    library(dbplyr)
})

con <- dbConnect(RMariaDB::MariaDB(), dbname = "mydb", group = "mygroup")
sql <- glue::glue("
SELECT `id_count`, RANK() 
OVER (ORDER BY `id_count` DESC) `rank` 
FROM (
  SELECT `id`,  count(`id`) `id_count` FROM `my_table` 
  WHERE `version_date` > '2017-12-31' GROUP BY `id`
) `t` 
ORDER BY `rank` LIMIT 10;
")
dbGetQuery(con, sql)
##   id_count rank
##1        13    1
##2        13    1
##3        13    1
##4        13    1
##5        13    1
##6        13    1
##7        13    1
##8        13    1
##9        13    1
##10       13    1

but I can't do it through dbplyr:

tbl(con, 'my_table') %>% 
   filter(version_date > '2017-12-31') %>% 
   group_by(id) %>% 
   summarize(id_count=n()) %>% 
   arrange(desc(id_count)) %>% 
   mutate(rank=min_rank(desc(id_count))) %>%
   select(id, rank)
##Error: Window function `rank()` is not supported by this database

I'm really a lot better with dplyr than I am with SQL. I use collect() to perform these kind of operations when I need to, as opposed to figuring out the SQL query and using dbGetQuery().

Are there plans to accommodate MariaDB and MySQL with windowing-query-generating code in dbplyr? Is the specific window query syntax much different than Postgres?

3 Likes

Sorry no, I am not confused about the documentation. That's why I linked to that specific "MYSQL/MARIADB" paragraph. The statement in that paragraph "but it does not support window functions (so you can't do grouped mutates and filters" is false and I maintain that MariaDB and MySQL do support windowing functions, but it's dbplyr's choice to ignore that and just say they're not supported. I provided links to the MySQL/MariaDB documentation that mentions that they are supported, and a not-exactly reproducible example of my experience of performing a windowing query on MariaDB through R.

I am not asking whether dbplyr can generate a windowing query when connected to MariaDB or not. I know it doesn't. I'm just asking why, because MariaDB/MySQL support them, the DBI and RMariaDB packages can handle them, it's just dbplyr which doesn't allow this.

At the very least the documentation should be corrected/clarified.

2 Likes

Oh, I see. I was thinking that the documentation was referencing dbplyr integration, not the databases themselves (I'll delete my comment above to avoid confusion).

For the documentation correction, could you please file an issue on GitHub in the dbplyr repo? That's the best way for the developers to keep track.

FWIW, it looks like SQLite gained window functions as well, the issue for which you can see here:

1 Like

I can try to make an issue on GitHub but I've usually been dismissed there and redirected here. Even the RMariaDB developer mentioned to come here.

I can't choose my database software. I'm stuck with what I have and I think it works fine. But I can mention that this issue may also apply to other databases. I hope they'll consider adding the functionality.

2 Likes

Sorry about that. Sometimes it's hard to get things sorted — oftentimes we'll ask to move something to community in case it's unclear whether or not it's a bug, and then re-open the thread with a better reprex/sense of what's going on if we discover that it is.

We just recently gave dbplyr it's own issues (we had had them combined with dplyr for a while), to keep track of this exact type of thing (the integration between dplyr and various flavors of SQL), so it's definitely the right place.

Hopefully we can get this added for MariaDB and SQLite in the next burst of dbplyr activity. :slightly_smiling_face:

1 Like

OK thanks, I made an issue:

https://github.com/tidyverse/dbplyr/issues/191

2 Likes

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.