group_by() in tidytable outputs differently than dplyr group_by()

SteeleBuns · October 3, 2025, 9:53pm

Hi folks. First time poster here. I've been using the tidyverse/dplyr for a number of years. I recently switched over to the tidytable package and have found an odd behavior with the group_by() function compared to the dplyr group_by() function that I don't understand. Typically, I group by one variable and then count a second variable, outputting the counts within a group, as seen below.

library(tidyverse)
library(tidytable)
data("starwars")

starwars %>% group_by(sex) %>% count(gender) # defaults to tidytable
# output only counts the group sex

as_tibble(starwars) %>% dplyr::group_by(sex) %>% dplyr::count(gender)
# output counts gender within the groups sex

Any help to understand what is going on or that I may not be understanding?

jrkrideau · October 4, 2025, 1:36am

I think one problem is that starwars is a tibble not a data table. Also, it looks like {data.table} and {tidyverse} just do things differently and {tidytable}is defaulting to {data.table} syntax some times.
}
Let's assume we have this:

# load packages -----------------------------------------------------------
library(data.table)
library(tidyverse)
library(tidytable)
# Load data ---------------------------------------------------------------
data("starwars")
DT <- as.data.table(starwars)

Here is how you would do the operation in {data.table}.

DT[ , .N,  by = c("sex", "gender") ]

This seems to give us the same thing in {tidytable}.

DT %>% group_by(sex, gender) %>% count(gender)

margusl · October 4, 2025, 8:31am

I believe this is (kind of) documented behaviour. While you can use grouped input frame and columns with dplyr::count() to add another grouping layer, it's either groups or columns for {tidytable} implementation:

count() returns counts by group on a grouped tidytable, or column names can be specified to return counts by group. ( Count observations by group — count • tidytable )

Meaning that when used with a grouped frame (object with class grouped_tt), columns passed to tidytable::count() are just discarded and never evaluated. You can check tidytable:::count.grouped_tt or pass some non-existing columns to count() and check that it never throws an error or warning:

dplyr::starwars |> 
  tidytable::group_by(sex) |>
  tidytable::count(foobar)
#> # A tidytable: 5 × 2
#> # Groups:      sex
#>   sex                n
#>   <chr>          <int>
#> 1 <NA>               4
#> 2 female            16
#> 3 hermaphroditic     1
#> 4 male              60
#> 5 none               6

I can't say that your example is an anti-pattern, but it's definitely more common to let count() do all the grouping or none at all, and for the latter case many actually use tally() instead. All three cases work the same with both {dplyr} and {tidytable} :

dplyr::starwars |> 
  tidytable::count(sex, gender)
#> # A tidytable: 6 × 3
#>   sex            gender        n
#>   <chr>          <chr>     <int>
#> 1 <NA>           <NA>          4
#> 2 female         feminine     16
#> 3 hermaphroditic masculine     1
#> 4 male           masculine    60
#> 5 none           feminine      1
#> 6 none           masculine     5

dplyr::starwars |> 
  tidytable::group_by(sex, gender) |>
  tidytable::count()
#> # A tidytable: 6 × 3
#> # Groups:      sex, gender
#>   sex            gender        n
#>   <chr>          <chr>     <int>
#> 1 <NA>           <NA>          4
#> 2 female         feminine     16
#> 3 hermaphroditic masculine     1
#> 4 male           masculine    60
#> 5 none           feminine      1
#> 6 none           masculine     5

dplyr::starwars |> 
  tidytable::group_by(sex, gender) |>
  tidytable::tally()
#> # A tidytable: 6 × 3
#> # Groups:      sex
#>   sex            gender        n
#>   <chr>          <chr>     <int>
#> 1 <NA>           <NA>          4
#> 2 female         feminine     16
#> 3 hermaphroditic masculine     1
#> 4 male           masculine    60
#> 5 none           feminine      1
#> 6 none           masculine     5

If you do need to pass grouped frames and a set of columns to count(), perhaps try {dtplyr}. It comes with less translated verbs but as it uses lazy evaluation and translates whole pipeline for data.table in one go, not eagerly like {tidytable}, there can be few such coroner cases where it behaves more like {dplyr}:

library(dtplyr)
library(dplyr, warn.conflicts = FALSE)

starwars |> 
  lazy_dt() |>
  group_by(sex) |> 
  count(gender)
#> Source: local data table [6 x 3]
#> Call:   `_DT1`[, .(n = .N), keyby = .(sex, gender)]
#> 
#>   sex            gender        n
#>   <chr>          <chr>     <int>
#> 1 <NA>           <NA>          4
#> 2 female         feminine     16
#> 3 hermaphroditic masculine     1
#> 4 male           masculine    60
#> 5 none           feminine      1
#> 6 none           masculine     5
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results

^{Created on 2025-10-04 with reprex v2.1.1}

SteeleBuns · October 6, 2025, 8:26pm

Thanks @jrkrideau . This is helpful to translate it to data.table() and to understand what is going on with the count() function.

SteeleBuns · October 6, 2025, 8:29pm

Thanks @margusl for interpreting the documentation for me. I did not understand the columns/grouped/count situation until you explained it clearly, such that they are discarded. The solution you provided with the tally() function is exactly what I was looking for. I typically work with percentages and counts, and when grouping by the first variable, sex, and then creating percentages of each row of gender, is part of my workflow. I never really knew how count() and tally() differ. Thank you again.

system · October 13, 2025, 8:30pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.