dplyr::group_by

Xanadu · January 15, 2023, 8:00pm

Hi,
I have a data frame and I can't seem to apply a group by over values as it doesn't do it at all.

df_Agg_Popular_Date %>%
    dplyr::group_by(Day)

## # Groups:   Day [30]
##      Day
##    <int>
##  1    17
##  2    25
##  3     1
##  4    19
##  5    16
##  6    20
##  7    24
##  8    30
##  9     2
## 10    16

16 is shown twice here which means the group by hasn't been executed successfully.

I've looked on the internet and have tried loading plyr before dplyr but without success..

#plyr must be installed before dplyr for grouping issues purposes.
detach("package:plyr")
detach("package:ggpubr", unload = TRUE)

results in an error as follow:
Error in detach("package:plyr") : invalid 'name' argument

jrkrideau · January 15, 2023, 8:16pm

My guess would be that Day is a character variable and there is a errant space in there but I have been wrong before.

Can you supply us with the data? A handy way to supply some sample data is the dput() function. Just do dput(mydata) where mydata is your data. Copy the output and paste it here.

EconProf · January 15, 2023, 8:43pm

group_by() does not change the contents of the data in any way. Every row will be in exactly the same position. However, if you perform an operation on the data frame it will do it for each group separately.

library(tidyverse)

as_tibble(mtcars)
#> # A tibble: 32 × 11
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
#>  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#>  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#>  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
#>  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
#> 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#> # … with 22 more rows
mtcars |> group_by(cyl)
#> # A tibble: 32 × 11
#> # Groups:   cyl [3]
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
#>  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#>  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#>  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
#>  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
#> 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#> # … with 22 more rows
mtcars |> group_by(cyl) |> count()
#> # A tibble: 3 × 2
#> # Groups:   cyl [3]
#>     cyl     n
#>   <dbl> <int>
#> 1     4    11
#> 2     6     7
#> 3     8    14
mtcars |> group_by(cyl) |> summarise(mpg = mean(mpg))
#> # A tibble: 3 × 2
#>     cyl   mpg
#>   <dbl> <dbl>
#> 1     4  26.7
#> 2     6  19.7
#> 3     8  15.1

^{Created on 2023-01-15 with reprex v2.0.2}

Xanadu · January 15, 2023, 9:44pm

I simply want to display the top 7 most common from the list

head(arrange(dplyr::select(dplyr::group_by(df_Agg_Popular_Date, Day), Day), desc(Day)), 7)

But the grouping is not applied at all

SELECT TOP 7 Day
FROM df_Agg_Popular_Date
GROUP BY Day
ORDER BY Day DESC

nirgrahamuk · January 15, 2023, 10:04pm

Group by in dplyr does not transform data, it modifies the behaviour of subsequent mutate()'s or summarise()'s.
It seems you want the dplyr distinct() behaviour to deduplicate on a column

Xanadu · January 15, 2023, 10:25pm

Thanks a lot, this is what I was looking for the whole time.

system · January 22, 2023, 10:25pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.