Thanks for quick help.
using mutate, can we summarize for different grouping variable as well ?
Meaning...
in the below code, p.x summed for every Species but
can we use this approach for while summarizing for every Petal.Length ? since we don't mention the grouping variable, it was confusing for me.
Sorry, I don't follow you. Doesn't it generate the same result as your z? The columns are reordered, but I don't think that's a big deal.
I used the names p.x and p.y to simplify comparison between this and z. There, p.y was calculated after grouping by both Species and Petal.Length, and p.x was calculated after grouping only by Species. Why do you want to summarise with grouping by Petal.Length now?
I get confused too, and to keep track of grouping, I just print out the intermediate results to help myself. %T>% is very helpful for this. Based on what I've noticed, if you group by multiple variables, one summarise will drop the last group variable, another will drop the second last, and so on.
If my solution doesn't work, can you please tell me what do you want to do and share your expected output?
You can use the map approach to summarize over multiple sets of grouping columns and return a long summary data frame. For example:
# Add a couple of grouping columns to iris
set.seed(2)
dat = iris %>%
mutate(Group1=sample(c("A","B"), 150, replace=TRUE),
Group2=sample(c("d","e"), 150, replace=TRUE))
# Get all combinations of 0 through 2 groups
map(0:2, ~combn(c("Species", "Group1","Group2"), .x, simplify=FALSE)) %>%
flatten() %>%
# Run summarise on each of the group sets created above
map_df(~dat %>%
group_by_at(.x) %>%
summarise(N=n(),
Petal.Width=mean(Petal.Width))) %>%
# Replace NA with "All" (representing marginalizing over that column)
mutate_if(is.factor, as.character) %>%
map_if(~!is.numeric(.), ~replace_na(., replace="All")) %>%
bind_rows()
N Petal.Width Species Group1 Group2
1 150 1.1993333 All All All
2 50 0.2460000 setosa All All
3 50 1.3260000 versicolor All All
4 50 2.0260000 virginica All All
5 81 1.2444444 All A All
6 69 1.1463768 All B All
7 76 1.1447368 All All d
8 74 1.2554054 All All e
9 25 0.2640000 setosa A All
10 25 0.2280000 setosa B All
11 30 1.3466667 versicolor A All
12 20 1.2950000 versicolor B All
13 26 2.0692308 virginica A All
14 24 1.9791667 virginica B All
15 28 0.2500000 setosa All d
16 22 0.2409091 setosa All e
17 23 1.3521739 versicolor All d
18 27 1.3037037 versicolor All e
19 25 1.9560000 virginica All d
20 25 2.0960000 virginica All e
21 41 1.2048780 All A d
22 40 1.2850000 All A e
23 35 1.0742857 All B d
24 34 1.2205882 All B e
Sorry, I edited my answer earlier and accidentally deleted the code to create the new grouping columns. I've fixed it now. Thanks for pointing that out.