Mean based on grouping by more than one variable using Dplyr

Hi,
I have this simple df:

source <- data.frame(
             stringsAsFactors = FALSE,
                  Employee.ID = c("aaa",
                                  "bbb","sss","ccc","fff","ffg","gedd"),
                    Age.Group = c("20-30",
                                  "30-40","20-30","40-50","40-50","20-30",
                                  "20-30"),
            Action_Positivity = c(100, 0, 100, 0, 0, 100, 100),
        Boundaries_Positivity = c(0, 0, 100, 100, 100, 0, 100),
                       Growth = c(3, 4, 2, 4, 5, 5, 5),
  Career.Wellbeing_Positivity = c(100, 0, 0, 0, 0, 100, 100),
              eSat_Positivity = c(0, 0, 100, 100, 100, 100, 100),
                   Technolory = c(2, 5, 3, 4, 4, 1, 5)
)

Now I need a table with Average eSat_Positivity for all other variables ending with "Positivity".
I know I can do it one by one:

impact.on.scores <- source %>% 
  group_by(Action_Positivity) %>% 
  summarize(mean_Sat = mean(eSat_Positivity))
impact.on.scores

But what I really need is either eSat_Positivity mean scores for all other "Positivity" variables on the left like:

  • Action_Positivity 100
  • Action_Positivity 0
  • Boundaries_Positivity 100
  • Boundaries_Positivity 0
  • Career.Wellbeing_Positivity 100
  • Career.Wellbeing_Positivity 0

...or, if easier, two separate tables:

  1. eSat_Positivity mean scores for all "Positivity" variables scored 100:
  • Action_Positivity
  • Boundaries_Positivity
  • Career.Wellbeing_Positivity
  1. eSat_Positivity mean scores for all "Positivity" variables scored 0:
  • Action_Positivity
  • Boundaries_Positivity
  • Career.Wellbeing_Positivity

Is this possible at all?

Hi @Slavek ,

given your example this should do it:

library(dplyr)

impact.on.scores <- source %>% 
  group_by(Action_Positivity) %>% 
  select(contains("Positivity")) %>% 
  summarise_all(mean, na.rm = TRUE) %>% 
  rename_with(~ paste0("mean_", .), -Action_Positivity)
impact.on.scores

Thank you but this is not what I want.
I want to include all variables containing "Positivity" (apart from eSat_Positivity) in group_by.
In your output there are two rows. I need either 6 rows in option one or two tables with 3 rows each

eSat_Positivity_Mean
Action_Positivity 100 75
Action_Positivity 0 66.6
Boundaries_Positivity 100 100
Boundaries_Positivity 0 33.3
Career.Wellbeing_Positivity 100 66.6
Career.Wellbeing_Positivity 0 75
100 eSat_Positivity_Mean
Action_Positivity 75
Boundaries_Positivity 100
Career.Wellbeing_Positivity 66.6
0 eSat_Positivity_Mean
Action_Positivity 66.6
Boundaries_Positivity 33.3
Career.Wellbeing_Positivity 75

Hi @Slavek

library(tidyr)
library(dplyr)

impact.on.scores <- source  %>% 
  select(contains("Positivity")) %>% 
  pivot_longer(cols = -eSat_Positivity,
               names_to = "Category", 
               values_to = "Value"
               ) %>% 
  group_by(Category, Value) %>% 
  summarise(eSat_Positivity_Mean = mean(eSat_Positivity, na.rm = TRUE))

Hope this matches the request :smiley:

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.