I've either got a misunderstanding or a bug... I think it's a bug.
It seems that dplyr::arrange()
with the .by_group=TRUE
parameter set produces SQL with an error. Here's how to reprex it:
On the DB (Redshift in my case) set up a dummy table:
drop TABLE sandbox.testorder;
CREATE TABLE sandbox.testorder (
grp varchar(255),
n DOUBLE PRECISION
);
INSERT INTO sandbox.testorder (grp , n) VALUES ('a',3.3);
INSERT INTO sandbox.testorder (grp , n) VALUES ('a',1.1);
INSERT INTO sandbox.testorder (grp , n) VALUES ('b',2.2);
INSERT INTO sandbox.testorder (grp , n) VALUES ('b',4.4);
Then from R
(presuming a connection to the DB called con
and already loaded dbplyr
testorder <- tbl(con, "testorder")
testorder %>%
group_by( grp ) %>%
arrange( n, .by_group=TRUE) ->
out_test
show_query(out_test)
which generates the following SQL:
SELECT *
FROM "testorder"
ORDER BY "n", TRUE
which fails if I try to collect(out_test)
with the following error:
Error in postgresqlExecStatement(conn, statement, ...) :
RS-DBI driver: (could not Retrieve the result : ERROR: non-integer constant in ORDER BY
the rub seems to be the ,TRUE
there at the end. If I remove it, I get the followable runnable SQL:
SELECT *
FROM "testorder"
ORDER BY "n"
My guess is that the routine that generates the SQL has a glitch. Looks like it's just passing , TRUE
instead of adding in the group by
variables.
I have not dug into the dbplyr
or dplyr
code to try and find the magic incantation. Not being familiar with the code base I didn't really feel up to digging into that beast tonight.
If this should go somewhere else, like the dbplyr
github issues log, let me know. I wasn't 100% sure where the right place to post is.