Hi Posit users.
Having issues utilizing rowwise/group_by()
with an across()
call when employing duckdb/dbplyr
Specifically, given data like the following:
df <- structure(list(ID = c("000000", "000000", "000001", "000002",
"000002", "000002", "000003", "000004", "000004", "000004"),
date = structure(c(19039, 10855, 11458, 16996, 11592, 12133,
11678, 12560, 17016, 18938), class = "Date"), Var_00 = c(0,
0, 0, 0, 1, 1, 1, 1, 1, 0), Var_01 = c(0, 0, 0, 0, 1, 1,
1, 1, 1, 0), Var_02 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Var_03 = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), Var_04 = c(0, 0, 0, 0, 0, 0,
0, 0, 0, 0), Var_05 = c(0, 0, 0, 0, 1, 1, 1, 1, 1, 0), Var_06 = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), Var_07 = c(0, 0, 0, 0, 1, 1,
1, 1, 1, 0), Var_08 = c(0, 0, 0, 0, 1, 1, 1, 1, 1, 0), Var_09 = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
I'd like to produce output like this:
dput(dfout)
structure(list(ID = c("000000", "000000", "000001", "000002",
"000002", "000002", "000003", "000004", "000004", "000004"),
date = structure(c(19039, 10855, 11458, 16996, 11592, 12133,
11678, 12560, 17016, 18938), class = "Date"), Var_00 = c(0,
0, 0, 0, 1, 1, 1, 1, 1, 0), Var_01 = c(0, 0, 0, 0, 1, 1,
1, 1, 1, 0), Var_02 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Var_03 = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), Var_04 = c(0, 0, 0, 0, 0, 0,
0, 0, 0, 0), Var_05 = c(0, 0, 0, 0, 1, 1, 1, 1, 1, 0), Var_06 = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), Var_07 = c(0, 0, 0, 0, 1, 1,
1, 1, 1, 0), Var_08 = c(0, 0, 0, 0, 1, 1, 1, 1, 1, 0), Var_09 = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), sum = c(0, 0, 0, 0, 5, 5, 5,
5, 5, 0)), class = c("rowwise_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -10L), groups = structure(list(.rows = structure(list(
1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame")))
Where the sum
column is the added total by row across all columns for a given row, using duckdb
and duckplyr
, as the actual data is on the order of a few million rows.
In standard dplyr
, I would use a call like one of the following:
out <- df |> rowwise() |> mutate(sum = sum(across(all_of(Var_list))))
out <- df |> rowwise() |> mutate(sum = sum(across(starts_with("Var"))))
out <- df |> group_by(ID,date) |> mutate(sum = sum(across(all_of(Var_list))))
out <- df |> group_by(ID,date) |> mutate(sum = sum(across(starts_with("Var"))))
However, rowwise()
doesn't seem to be implemented in duckplyr()
out <- df |> to_duckdb() |> rowwise() |> mutate(sum = sum(across(all_of(Var_list))))
Error in UseMethod("rowwise") :
no applicable method for 'rowwise' applied to an object of class "c('tbl_duckdb_connection', 'tbl_dbi', 'tbl_sql', 'tbl_lazy', 'tbl')"
out <- df |> to_duckdb() |> rowwise() |> mutate(sum = sum(across(starts_with("Var"))))
Error in UseMethod("rowwise") :
no applicable method for 'rowwise' applied to an object of class "c('tbl_duckdb_connection', 'tbl_dbi', 'tbl_sql', 'tbl_lazy', 'tbl')"
And using group_by()
when the data is converted to a duckdb
throws a SQL
interpretation error I haven't quite understood the suggested conversion for:
> out <- df |> to_duckdb() |> group_by(ID,date) |> mutate(sum = sum(across(all_of(Var_list))))
> out
Error:
! Cannot translate a symbol to SQL.
ℹ Do you want to force evaluation in R with (e.g.) `!!x` or `local(x)`?
Run `rlang::last_trace()` to see where the error occurred.
> rlang::last_trace()
<error/rlang_error>
Error:
! Cannot translate a symbol to SQL.
ℹ Do you want to force evaluation in R with (e.g.) `!!x` or `local(x)`?
---
Backtrace:
▆
1. ├─base (local) `<fn>`(x)
2. ├─dbplyr:::print.tbl_sql(x)
3. │ ├─dbplyr:::cat_line(format(x, ..., n = n, width = width, n_extra = n_extra))
4. │ │ ├─base::cat(paste0(..., "\n"), sep = "")
5. │ │ └─base::paste0(..., "\n")
6. │ ├─base::format(x, ..., n = n, width = width, n_extra = n_extra)
7. │ └─pillar:::format.tbl(x, ..., n = n, width = width, n_extra = n_extra)
8. │ └─pillar:::format_tbl(...)
9. │ └─pillar::tbl_format_setup(...)
10. │ ├─pillar:::tbl_format_setup_dispatch(...)
11. │ └─pillar:::tbl_format_setup.tbl(...)
12. │ └─pillar:::df_head(x, n + 1)
13. │ ├─base::as.data.frame(head(x, n))
14. │ └─dbplyr:::as.data.frame.tbl_sql(head(x, n))
15. │ ├─base::as.data.frame(collect(x, n = n))
16. │ ├─dplyr::collect(x, n = n)
17. │ └─dbplyr:::collect.tbl_sql(x, n = n)
18. │ └─dbplyr::db_sql_render(x$src$con, x, cte = cte)
19. │ ├─dbplyr::db_sql_render(con, sql, ..., sql_options = sql_options)
20. │ └─dbplyr:::db_sql_render.DBIConnection(con, sql, ..., sql_options = sql_options)
21. │ ├─dbplyr::sql_render(sql, con = con, ..., sql_options = sql_options)
22. │ └─dbplyr:::sql_render.tbl_lazy(sql, con = con, ..., sql_options = sql_options)
23. │ ├─dbplyr::sql_render(...)
24. │ └─dbplyr:::sql_render.lazy_query(...)
25. │ ├─dbplyr::sql_build(query, con = con, sql_options = sql_options)
26. │ └─dbplyr:::sql_build.lazy_select_query(query, con = con, sql_options = sql_options)
27. │ └─dbplyr:::get_select_sql(...)
28. │ └─dbplyr:::translate_select_sql(con, select)
29. │ └─dbplyr::translate_sql_(...)
30. │ └─base::lapply(...)
31. │ └─dbplyr (local) FUN(X[[i]], ...)
32. │ ├─dbplyr::escape(eval_tidy(x, mask), con = con)
33. │ └─rlang::eval_tidy(x, mask)
34. ├─dbplyr (local) sum(`<named list>`)
35. │ ├─dbplyr::win_over(...)
36. │ │ └─dbplyr:::glue_sql2(con, "{.val expr} OVER {.val over}")
37. │ │ ├─dbplyr::sql(...)
38. │ │ │ └─dbplyr:::c_character(...)
39. │ │ └─glue::glue(...)
40. │ │ └─glue::glue_data(...)
41. │ └─dbplyr:::glue_sql2(sql_current_con(), "{f}({.val x})")
42. │ ├─dbplyr::sql(...)
43. │ │ └─dbplyr:::c_character(...)
44. │ └─glue::glue(...)
45. │ └─glue::glue_data(...)
46. ├─glue (local) `<fn>`(".val expr")
47. │ ├─.transformer(expr, env) %||% .null
48. │ └─dbplyr (local) .transformer(expr, env)
49. │ └─base::eval(parse(text = value, keep.source = FALSE), envir)
50. │ └─base::eval(parse(text = value, keep.source = FALSE), envir)
51. └─glue (local) `<fn>`(".val x")
52. ├─.transformer(expr, env) %||% .null
53. └─dbplyr (local) .transformer(expr, env)
54. ├─dbplyr::escape(value, con = connection)
55. └─dbplyr:::escape.list(value, con = connection)
56. └─base::vapply(x, escape, character(1), con = con)
57. ├─dbplyr (local) FUN(X[[i]], ...)
58. └─dbplyr:::escape.default(X[[i]], ...)
59. └─dbplyr:::error_embed(obj_type_friendly(x), "x")
Run rlang::last_trace(drop = FALSE) to see 2 hidden frames.
(see below for hidden frame output of similar command)
> out <- df |> to_duckdb() |> group_by(ID,date) |> mutate(sum = sum(across(starts_with("Var"))))
> out
Error:
! Cannot translate a symbol to SQL.
ℹ Do you want to force evaluation in R with (e.g.) `!!x` or `local(x)`?
Run `rlang::last_trace()` to see where the error occurred.
> rlang::last_trace()
<error/rlang_error>
Error:
! Cannot translate a symbol to SQL.
ℹ Do you want to force evaluation in R with (e.g.) `!!x` or `local(x)`?
---
Backtrace:
▆
1. ├─base (local) `<fn>`(x)
2. ├─dbplyr:::print.tbl_sql(x)
3. │ ├─dbplyr:::cat_line(format(x, ..., n = n, width = width, n_extra = n_extra))
4. │ │ ├─base::cat(paste0(..., "\n"), sep = "")
5. │ │ └─base::paste0(..., "\n")
6. │ ├─base::format(x, ..., n = n, width = width, n_extra = n_extra)
7. │ └─pillar:::format.tbl(x, ..., n = n, width = width, n_extra = n_extra)
8. │ └─pillar:::format_tbl(...)
9. │ └─pillar::tbl_format_setup(...)
10. │ ├─pillar:::tbl_format_setup_dispatch(...)
11. │ └─pillar:::tbl_format_setup.tbl(...)
12. │ └─pillar:::df_head(x, n + 1)
13. │ ├─base::as.data.frame(head(x, n))
14. │ └─dbplyr:::as.data.frame.tbl_sql(head(x, n))
15. │ ├─base::as.data.frame(collect(x, n = n))
16. │ ├─dplyr::collect(x, n = n)
17. │ └─dbplyr:::collect.tbl_sql(x, n = n)
18. │ └─dbplyr::db_sql_render(x$src$con, x, cte = cte)
19. │ ├─dbplyr::db_sql_render(con, sql, ..., sql_options = sql_options)
20. │ └─dbplyr:::db_sql_render.DBIConnection(con, sql, ..., sql_options = sql_options)
21. │ ├─dbplyr::sql_render(sql, con = con, ..., sql_options = sql_options)
22. │ └─dbplyr:::sql_render.tbl_lazy(sql, con = con, ..., sql_options = sql_options)
23. │ ├─dbplyr::sql_render(...)
24. │ └─dbplyr:::sql_render.lazy_query(...)
25. │ ├─dbplyr::sql_build(query, con = con, sql_options = sql_options)
26. │ └─dbplyr:::sql_build.lazy_select_query(query, con = con, sql_options = sql_options)
27. │ └─dbplyr:::get_select_sql(...)
28. │ └─dbplyr:::translate_select_sql(con, select)
29. │ └─dbplyr::translate_sql_(...)
30. │ └─base::lapply(...)
31. │ └─dbplyr (local) FUN(X[[i]], ...)
32. │ ├─dbplyr::escape(eval_tidy(x, mask), con = con)
33. │ └─rlang::eval_tidy(x, mask)
34. ├─dbplyr (local) sum(`<named list>`)
35. │ ├─dbplyr::win_over(...)
36. │ │ └─dbplyr:::glue_sql2(con, "{.val expr} OVER {.val over}")
37. │ │ ├─dbplyr::sql(...)
38. │ │ │ └─dbplyr:::c_character(...)
39. │ │ └─glue::glue(...)
40. │ │ └─glue::glue_data(...)
41. │ └─dbplyr:::glue_sql2(sql_current_con(), "{f}({.val x})")
42. │ ├─dbplyr::sql(...)
43. │ │ └─dbplyr:::c_character(...)
44. │ └─glue::glue(...)
45. │ └─glue::glue_data(...)
46. ├─glue (local) `<fn>`(".val expr")
47. │ ├─.transformer(expr, env) %||% .null
48. │ └─dbplyr (local) .transformer(expr, env)
49. │ └─base::eval(parse(text = value, keep.source = FALSE), envir)
50. │ └─base::eval(parse(text = value, keep.source = FALSE), envir)
51. └─glue (local) `<fn>`(".val x")
52. ├─.transformer(expr, env) %||% .null
53. └─dbplyr (local) .transformer(expr, env)
54. ├─dbplyr::escape(value, con = connection)
55. └─dbplyr:::escape.list(value, con = connection)
56. └─base::vapply(x, escape, character(1), con = con)
57. ├─dbplyr (local) FUN(X[[i]], ...)
58. └─dbplyr:::escape.default(X[[i]], ...)
59. └─dbplyr:::error_embed(obj_type_friendly(x), "x")
Run rlang::last_trace(drop = FALSE) to see 2 hidden frames.
> rlang::last_trace(drop = FALSE)
<error/rlang_error>
Error:
! Cannot translate a symbol to SQL.
ℹ Do you want to force evaluation in R with (e.g.) `!!x` or `local(x)`?
---
Backtrace:
▆
1. ├─base (local) `<fn>`(x)
2. ├─dbplyr:::print.tbl_sql(x)
3. │ ├─dbplyr:::cat_line(format(x, ..., n = n, width = width, n_extra = n_extra))
4. │ │ ├─base::cat(paste0(..., "\n"), sep = "")
5. │ │ └─base::paste0(..., "\n")
6. │ ├─base::format(x, ..., n = n, width = width, n_extra = n_extra)
7. │ └─pillar:::format.tbl(x, ..., n = n, width = width, n_extra = n_extra)
8. │ └─pillar:::format_tbl(...)
9. │ └─pillar::tbl_format_setup(...)
10. │ ├─pillar:::tbl_format_setup_dispatch(...)
11. │ └─pillar:::tbl_format_setup.tbl(...)
12. │ └─pillar:::df_head(x, n + 1)
13. │ ├─base::as.data.frame(head(x, n))
14. │ └─dbplyr:::as.data.frame.tbl_sql(head(x, n))
15. │ ├─base::as.data.frame(collect(x, n = n))
16. │ ├─dplyr::collect(x, n = n)
17. │ └─dbplyr:::collect.tbl_sql(x, n = n)
18. │ └─dbplyr::db_sql_render(x$src$con, x, cte = cte)
19. │ ├─dbplyr::db_sql_render(con, sql, ..., sql_options = sql_options)
20. │ └─dbplyr:::db_sql_render.DBIConnection(con, sql, ..., sql_options = sql_options)
21. │ ├─dbplyr::sql_render(sql, con = con, ..., sql_options = sql_options)
22. │ └─dbplyr:::sql_render.tbl_lazy(sql, con = con, ..., sql_options = sql_options)
23. │ ├─dbplyr::sql_render(...)
24. │ └─dbplyr:::sql_render.lazy_query(...)
25. │ ├─dbplyr::sql_build(query, con = con, sql_options = sql_options)
26. │ └─dbplyr:::sql_build.lazy_select_query(query, con = con, sql_options = sql_options)
27. │ └─dbplyr:::get_select_sql(...)
28. │ └─dbplyr:::translate_select_sql(con, select)
29. │ └─dbplyr::translate_sql_(...)
30. │ └─base::lapply(...)
31. │ └─dbplyr (local) FUN(X[[i]], ...)
32. │ ├─dbplyr::escape(eval_tidy(x, mask), con = con)
33. │ └─rlang::eval_tidy(x, mask)
34. ├─dbplyr (local) sum(`<named list>`)
35. │ ├─dbplyr::win_over(...)
36. │ │ └─dbplyr:::glue_sql2(con, "{.val expr} OVER {.val over}")
37. │ │ ├─dbplyr::sql(...)
38. │ │ │ └─dbplyr:::c_character(...)
39. │ │ └─glue::glue(...)
40. │ │ └─glue::glue_data(...)
41. │ └─dbplyr:::glue_sql2(sql_current_con(), "{f}({.val x})")
42. │ ├─dbplyr::sql(...)
43. │ │ └─dbplyr:::c_character(...)
44. │ └─glue::glue(...)
45. │ └─glue::glue_data(...)
46. ├─glue (local) `<fn>`(".val expr")
47. │ ├─.transformer(expr, env) %||% .null
48. │ └─dbplyr (local) .transformer(expr, env)
49. │ └─base::eval(parse(text = value, keep.source = FALSE), envir)
50. │ └─base::eval(parse(text = value, keep.source = FALSE), envir)
51. └─glue (local) `<fn>`(".val x")
52. ├─.transformer(expr, env) %||% .null
53. └─dbplyr (local) .transformer(expr, env)
54. ├─dbplyr::escape(value, con = connection)
55. └─dbplyr:::escape.list(value, con = connection)
56. └─base::vapply(x, escape, character(1), con = con)
57. ├─dbplyr (local) FUN(X[[i]], ...)
58. └─dbplyr:::escape.default(X[[i]], ...)
59. └─dbplyr:::error_embed(obj_type_friendly(x), "x")
60. └─cli::cli_abort(...)
61. └─rlang::abort(...)
I'm assuming this is some sort of R
to SQL
translation on the backend that's getting lost that can be fixed with a direct call to one or the other in some way, but it isn't intuitive to me how to force evaluation in R with (e.g.) `!!x` or `local(x)`?
as it were.
Would you happen to have any suggestions or an example as to how I could perform or get around this?
Thank you in advance!
Edit:
Forgot loaded packages.
Apologies
microbenchmark_1.4.10
profmem_0.6.0
fastDummies_1.7.3
glue_1.7.0
data.table_1.15.4
janitor_2.2.0
readxl_1.4.3
reticulate_1.38.0
parquetize_0.5.7
duckplyr_0.4.1
duckdb_1.0.0-2
DBI_1.2.3
arrow_16.1.0
haven_2.5.4
fs_1.6.4
lubridate_1.9.3
forcats_1.0.0
stringr_1.5.1
dplyr_1.1.4
purrr_1.0.2
readr_2.1.5
tidyr_1.3.1
tibble_3.2.1
ggplot2_3.5.1
tidyverse_2.0.0