Using group_by and loops

kurt.bem · February 11, 2022, 8:26pm

for(sex in c('M', 'E')){
  for(unit in c('ST','AL', 'SC')){
    for (group in c('RT', 'SX', 'DS', 'LP')) {
      
    data<- data %>%
		mutate(n = 1)
		group_by(code,level,performance,administered, GROUP)
		dplyr::summmarise(n = sum(n))
    
    }
  }
}

I am trying to group by one of the variables in a loop to get the amount of occurrence in that instance. I have included an example of the code. The uppercase variable in the group_by is the local in the loop.

technocrat · February 11, 2022, 10:22pm

A little representative data would help. See the FAQ: How to do a minimal reproducible example reprex for beginners.

The reason the code doesn't work as expected is

It doesn't return anything; it just makes an assignment.
Even if it returned the data object that would not be the same data object
modified per the loop, because two names cannot coexist in the same namespace at the same time.
The data object in the local environment of the loop gets created only in the inner loop, and each time it is called, wipes out the previous iteration's value. So only the last result is available to be returned.

A better approach in R is to reason about the problem along the following lines: f(x) = y.

x is what there is to begin with. Assume it is a data frame, Data (uppercase to distinguish it from data, the name of a function, because some operations give the name of the function precedence). Data contains variables (columns) named, say, sex, unit and group (ok because even though {dplyr} has several functions beginning group_by there is no group func_tion. In addition, we have cost, level, performance, and administered. (I can't guess where GROUP comes from. Assume that Data has other columns that are uninteresting and that Data is tidy (each row is uniquely identifiable).

y is a collection of contingency tables for Data

f might already exist somewhere to prepare the collection, but difficult to find, so we will compose f

f_1 to prepare an individual contingency table will be required. For that there is base::table.

head(warpbreaks)
#>   breaks wool tension
#> 1     26    A       L
#> 2     30    A       L
#> 3     54    A       L
#> 4     25    A       L
#> 5     70    A       L
#> 6     52    A       L
with(warpbreaks, table(wool, tension))
#>     tension
#> wool L M H
#>    A 9 9 9
#>    B 9 9 9

table() takes built-in data frame warpbreaks and its tension variable and returns the number of occurrences of combination of the variables wool and tension. For x this is analogous to the combination of code, level, performance and administered, so we could take each of the combinations of sex, unit and group in Data and apply table() as our f_1. That leaves getting each such combination.

f_2 thus takes the possible values of the variables of an object like Data. That's what base::expand_grid() does.

expand.grid(height = seq(60, 80, 5), weight = seq(100, 300, 50),
            sex = c("Male","Female"))
#>    height weight    sex
#> 1      60    100   Male
#> 2      65    100   Male
#> 3      70    100   Male
#> 4      75    100   Male
#> 5      80    100   Male
#> 6      60    150   Male
#> 7      65    150   Male
#> 8      70    150   Male
#> 9      75    150   Male
#> 10     80    150   Male
#> 11     60    200   Male
#> 12     65    200   Male
#> 13     70    200   Male
#> 14     75    200   Male
#> 15     80    200   Male
#> 16     60    250   Male
#> 17     65    250   Male
#> 18     70    250   Male
#> 19     75    250   Male
#> 20     80    250   Male
#> 21     60    300   Male
#> 22     65    300   Male
#> 23     70    300   Male
#> 24     75    300   Male
#> 25     80    300   Male
#> 26     60    100 Female
#> 27     65    100 Female
#> 28     70    100 Female
#> 29     75    100 Female
#> 30     80    100 Female
#> 31     60    150 Female
#> 32     65    150 Female
#> 33     70    150 Female
#> 34     75    150 Female
#> 35     80    150 Female
#> 36     60    200 Female
#> 37     65    200 Female
#> 38     70    200 Female
#> 39     75    200 Female
#> 40     80    200 Female
#> 41     60    250 Female
#> 42     65    250 Female
#> 43     70    250 Female
#> 44     75    250 Female
#> 45     80    250 Female
#> 46     60    300 Female
#> 47     65    300 Female
#> 48     70    300 Female
#> 49     75    300 Female
#> 50     80    300 Female

Because functions can be arguments to other functions we can apply table() to expand_grid()

table(expand.grid(height = seq(60, 80, 5), weight = seq(100, 300, 50),
            sex = c("Male","Female")))
#> , , sex = Male
#> 
#>       weight
#> height 100 150 200 250 300
#>     60   1   1   1   1   1
#>     65   1   1   1   1   1
#>     70   1   1   1   1   1
#>     75   1   1   1   1   1
#>     80   1   1   1   1   1
#> 
#> , , sex = Female
#> 
#>       weight
#> height 100 150 200 250 300
#>     60   1   1   1   1   1
#>     65   1   1   1   1   1
#>     70   1   1   1   1   1
#>     75   1   1   1   1   1
#>     80   1   1   1   1   1

If the parentheses start to become blurry:

expand.grid(height = seq(60, 80, 5), weight = seq(100, 300, 50),
            sex = c("Male","Female")) |> table()

Because expand_grid will take a list of vectors (columns) , we can give it one.

f_3 would do the same :expand.grid on sex, unit and group in Data.

Come back with some of Data if you need help implementing this.

mikecrobp · February 16, 2022, 8:39am

Are you missing 2 %>% between the mutate and group_by and group_by and summarise?

system · March 9, 2022, 8:40am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.