Xanadu
January 15, 2023, 8:00pm
1
Hi,
I have a data frame and I can't seem to apply a group by over values as it doesn't do it at all.
df_Agg_Popular_Date %>%
dplyr::group_by(Day)
## # Groups: Day [30]
## Day
## <int>
## 1 17
## 2 25
## 3 1
## 4 19
## 5 16
## 6 20
## 7 24
## 8 30
## 9 2
## 10 16
16 is shown twice here which means the group by hasn't been executed successfully.
I've looked on the internet and have tried loading plyr before dplyr but without success..
#plyr must be installed before dplyr for grouping issues purposes.
detach("package:plyr")
detach("package:ggpubr", unload = TRUE)
results in an error as follow:
Error in detach("package:plyr") : invalid 'name' argument
My guess would be that Day is a character variable and there is a errant space in there but I have been wrong before.
Can you supply us with the data? A handy way to supply some sample data is the dput() function. Just do dput(mydata) where mydata is your data. Copy the output and paste it here.
group_by() does not change the contents of the data in any way. Every row will be in exactly the same position. However, if you perform an operation on the data frame it will do it for each group separately.
library(tidyverse)
as_tibble(mtcars)
#> # A tibble: 32 × 11
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
#> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
#> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
#> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
#> 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
#> 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
#> 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
#> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
#> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
#> # … with 22 more rows
mtcars |> group_by(cyl)
#> # A tibble: 32 × 11
#> # Groups: cyl [3]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
#> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
#> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
#> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
#> 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
#> 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
#> 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
#> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
#> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
#> # … with 22 more rows
mtcars |> group_by(cyl) |> count()
#> # A tibble: 3 × 2
#> # Groups: cyl [3]
#> cyl n
#> <dbl> <int>
#> 1 4 11
#> 2 6 7
#> 3 8 14
mtcars |> group_by(cyl) |> summarise(mpg = mean(mpg))
#> # A tibble: 3 × 2
#> cyl mpg
#> <dbl> <dbl>
#> 1 4 26.7
#> 2 6 19.7
#> 3 8 15.1
Created on 2023-01-15 with reprex v2.0.2
Xanadu
January 15, 2023, 9:44pm
4
I simply want to display the top 7 most common from the list
head(arrange(dplyr::select(dplyr::group_by(df_Agg_Popular_Date, Day), Day), desc(Day)), 7)
But the grouping is not applied at all
SELECT TOP 7 Day
FROM df_Agg_Popular_Date
GROUP BY Day
ORDER BY Day DESC
Group by in dplyr does not transform data, it modifies the behaviour of subsequent mutate()'s or summarise()'s.
It seems you want the dplyr distinct() behaviour to deduplicate on a column
1 Like
Xanadu
January 15, 2023, 10:25pm
6
Thanks a lot, this is what I was looking for the whole time.
system
Closed
January 22, 2023, 10:25pm
7
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.