Feedback on dplyr 1.1.0

With dplyr 1.1.0 being released this week, I expected more questions/observations.
Maybe everyone held off or tested extensively on pre-release versions

My feedback FWIW:

  1. I hit the new warnings way more than I expected
  2. First I hit the join warning when multiple matches are made. My initial to the reaction to the release notes had been that SQL didn't see fit to warn so why should dplyr. But in fact it has been useful to review those joins and make sure 1->Many was intended. Hopefully I have deduplicated all of my data but at least this warning will help me pick up on any new ones. And now my code makes explicit what is expected.
  3. And still on joins: I like the new "use join_by(x)" message to encourage use of the new helper function rather than just by. Though for extra credit I'd ask to keep the preceding comma so that you can copy/paste it in without having to type the comma.
  4. I am getting a few "Returning more (or less) than 1 row per summarise() group was deprecated in dplyr 1.1.0" - typically on empty tibbles/dataframes (>0 columns but 0 rows). I could do without those. And I can't see which line they are from with lifecycle::last_lifecycle_warnings()
1 Like

Thanks for the feedback. I've passed this along to the devs—replies might be a bit delayed since it's a company off-site this week. The third one might make for a good FR on GitHub! :slightly_smiling_face:

2 Likes

The 3rd one is already an issue :smile: Make `Joining with` message clickable to copy-to-clipboard · Issue #6580 · tidyverse/dplyr · GitHub

1 Like

Thank you both. Sorry for missing #3 in GitHub. Mentioning the issue was an afterthought. I really liked the fact that the warning had changed. As for #4 I have found the case where I have an issue but I will raise a new question in the community. Enjoy the offsite.

I think Hadley opened an issue for number 4, too:

After updating, I am having issues with data types. Working with a Snowflake database, numeric columns are coming through as character after collect().

In the example below, the SQL and aggregation works properly, but the resulting data frame would have a total column of type character.

tblSnowflake %>% 
  group_by(CHARACTER_COLUMN) %>% 
  summarize(total = sum(NUMERIC_COLUMN)) %>% 
  collect() 

Unsure if this is a bug, but I did not have the issue previously.

It sounds like this is a separate problem. Could you file an issue in the GitHub repo?
Thanks