Feedback on dplyr 1.1.0

mikecrobp · February 1, 2023, 10:47am

With dplyr 1.1.0 being released this week, I expected more questions/observations.
Maybe everyone held off or tested extensively on pre-release versions

My feedback FWIW:

I hit the new warnings way more than I expected
First I hit the join warning when multiple matches are made. My initial to the reaction to the release notes had been that SQL didn't see fit to warn so why should dplyr. But in fact it has been useful to review those joins and make sure 1->Many was intended. Hopefully I have deduplicated all of my data but at least this warning will help me pick up on any new ones. And now my code makes explicit what is expected.
And still on joins: I like the new "use join_by(x)" message to encourage use of the new helper function rather than just by. Though for extra credit I'd ask to keep the preceding comma so that you can copy/paste it in without having to type the comma.
I am getting a few "Returning more (or less) than 1 row per summarise() group was deprecated in dplyr 1.1.0" - typically on empty tibbles/dataframes (>0 columns but 0 rows). I could do without those. And I can't see which line they are from with lifecycle::last_lifecycle_warnings()

mara · February 1, 2023, 2:58pm

Thanks for the feedback. I've passed this along to the devs—replies might be a bit delayed since it's a company off-site this week. The third one might make for a good FR on GitHub!

hadley · February 1, 2023, 3:12pm

The 3rd one is already an issue Make `Joining with` message clickable to copy-to-clipboard · Issue #6580 · tidyverse/dplyr · GitHub

mikecrobp · February 1, 2023, 3:20pm

Thank you both. Sorry for missing #3 in GitHub. Mentioning the issue was an afterthought. I really liked the fact that the warning had changed. As for #4 I have found the case where I have an issue but I will raise a new question in the community. Enjoy the offsite.

mara · February 1, 2023, 3:27pm

I think Hadley opened an issue for number 4, too:

github.com/tidyverse/dplyr

Incorrect warning when summarising empty data frame

opened 03:11PM - 01 Feb 23 UTC

closed 04:07PM - 01 Feb 23 UTC

hadley

``` r library(dplyr, warn.conflicts = FALSE) df <- tibble(x = integer()) df… |> summarise(y = x + 1) #> Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in #> dplyr 1.1.0. #> ℹ Please use `reframe()` instead. #> ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()` #> always returns an ungrouped data frame and adjust accordingly. #> # A tibble: 0 × 1 #> # … with 1 variable: y <dbl> ``` <sup>Created on 2023-02-01 with [reprex v2.0.2](https://reprex.tidyverse.org)</sup>

udostrasser · February 13, 2023, 9:14pm

After updating, I am having issues with data types. Working with a Snowflake database, numeric columns are coming through as character after collect().

In the example below, the SQL and aggregation works properly, but the resulting data frame would have a total column of type character.

tblSnowflake %>% 
  group_by(CHARACTER_COLUMN) %>% 
  summarize(total = sum(NUMERIC_COLUMN)) %>% 
  collect()

Unsure if this is a bug, but I did not have the issue previously.

mara · February 14, 2023, 12:28pm

It sounds like this is a separate problem. Could you file an issue in the GitHub repo?
Thanks