I've either got a misunderstanding or a bug... I think it's a bug.
It seems that
dplyr::arrange() with the
.by_group=TRUE parameter set produces SQL with an error. Here's how to reprex it:
On the DB (Redshift in my case) set up a dummy table:
drop TABLE sandbox.testorder; CREATE TABLE sandbox.testorder ( grp varchar(255), n DOUBLE PRECISION ); INSERT INTO sandbox.testorder (grp , n) VALUES ('a',3.3); INSERT INTO sandbox.testorder (grp , n) VALUES ('a',1.1); INSERT INTO sandbox.testorder (grp , n) VALUES ('b',2.2); INSERT INTO sandbox.testorder (grp , n) VALUES ('b',4.4);
R (presuming a connection to the DB called
con and already loaded
testorder <- tbl(con, "testorder") testorder %>% group_by( grp ) %>% arrange( n, .by_group=TRUE) -> out_test show_query(out_test)
which generates the following SQL:
SELECT * FROM "testorder" ORDER BY "n", TRUE
which fails if I try to
collect(out_test) with the following error:
Error in postgresqlExecStatement(conn, statement, ...) : RS-DBI driver: (could not Retrieve the result : ERROR: non-integer constant in ORDER BY
the rub seems to be the
,TRUE there at the end. If I remove it, I get the followable runnable SQL:
SELECT * FROM "testorder" ORDER BY "n"
My guess is that the routine that generates the SQL has a glitch. Looks like it's just passing
, TRUE instead of adding in the
group by variables.
I have not dug into the
dplyr code to try and find the magic incantation. Not being familiar with the code base I didn't really feel up to digging into that beast tonight.
If this should go somewhere else, like the
dbplyr github issues log, let me know. I wasn't 100% sure where the right place to post is.