How to apply a linear regression function to each group and store the coefficients in columns

Hello coders,

I have the following data frame:

d <- structure(list(Group = c("A", "A", "A", "A", "A", "B", "B", "B", 
"B", "B", "C", "C", "C", "C", "C", "D", "D", "D", "D", "D"), 
    D = c(0.1, 0.2, 0.2, 0.1, 0.1, 0.5, 0.5, 0.1, 0.1, 0.5, 0.1, 
    0.5, 0.5, 0.1, 0.5, 0.1, 0.5, 0.5, 0.1, 0.5), LF = c(8.4368504387337, 
    8.47470313979528, 8.38640090116621, 6.06181547038399, 6.19275193549284, 
    5.9648638653108, 6.18325202989082, 6.77963777620169, 5.88425046280083, 
    6.30863403147467, 5.96856546407038, 6.43276144155514, 6.33398985544227, 
    7.5847730776122, 5.75127309348355, 5.78970529642155, 5.99057129113077, 
    5.85363823020058, 6.14846829591765, 6.22689592921589)), row.names = c(NA, 
-20L), class = "data.frame")

Fo each group (A-D) in dataframe d, I would like to take a linear regression of columns LF and D. Then, I want to store the following values from the summary of the model into columns called r2 and Kd:

lm <- lm(d$LF ~ d$D)
summary <- summary(lm)
r2 <- summary$r.squared
Kd <- summary$coefficients[2,1]

Does anyone know how to iterate this over each group in the data frame? I'm sure group_by and lapply would work in this situation, but I'm having trouble with the syntax. Each group will have repeating values for r2 and Kd and that's fine.

Thanks so much for any help you can provide!

Does this get you what you want? You cannot use lapply() with group_by() because group_by() works with functions from the tidyverse, not functions from base R.

d <- structure(list(Group = c("A", "A", "A", "A", "A", "B", "B", "B", 
                              "B", "B", "C", "C", "C", "C", "C", "D", "D", "D", "D", "D"), 
                    D = c(0.1, 0.2, 0.2, 0.1, 0.1, 0.5, 0.5, 0.1, 0.1, 0.5, 0.1, 
                          0.5, 0.5, 0.1, 0.5, 0.1, 0.5, 0.5, 0.1, 0.5), 
                    LF = c(8.4368504387337, 8.47470313979528, 8.38640090116621, 6.06181547038399, 6.19275193549284, 
                           5.9648638653108, 6.18325202989082, 6.77963777620169, 5.88425046280083, 
                           6.30863403147467, 5.96856546407038, 6.43276144155514, 6.33398985544227, 
                           7.5847730776122, 5.75127309348355, 5.78970529642155, 5.99057129113077, 
                           5.85363823020058, 6.14846829591765, 6.22689592921589)), 
               row.names = c(NA, -20L), class = "data.frame")


library(dplyr)
d <- d |> group_by(Group) |> 
  mutate(r2 = summary(lm(LF ~ D))$r.squared,
         kd = summary(lm(LF ~ D))$coefficients[2,1])
d
#> # A tibble: 20 × 5
#> # Groups:   Group [4]
#>    Group     D    LF     r2     kd
#>    <chr> <dbl> <dbl>  <dbl>  <dbl>
#>  1 A       0.1  8.44 0.442  15.3  
#>  2 A       0.2  8.47 0.442  15.3  
#>  3 A       0.2  8.39 0.442  15.3  
#>  4 A       0.1  6.06 0.442  15.3  
#>  5 A       0.1  6.19 0.442  15.3  
#>  6 B       0.5  5.96 0.0775 -0.449
#>  7 B       0.5  6.18 0.0775 -0.449
#>  8 B       0.1  6.78 0.0775 -0.449
#>  9 B       0.1  5.88 0.0775 -0.449
#> 10 B       0.5  6.31 0.0775 -0.449
#> 11 C       0.1  5.97 0.217  -1.51 
#> 12 C       0.5  6.43 0.217  -1.51 
#> 13 C       0.5  6.33 0.217  -1.51 
#> 14 C       0.1  7.58 0.217  -1.51 
#> 15 C       0.5  5.75 0.217  -1.51 
#> 16 D       0.1  5.79 0.0257  0.137
#> 17 D       0.5  5.99 0.0257  0.137
#> 18 D       0.5  5.85 0.0257  0.137
#> 19 D       0.1  6.15 0.0257  0.137
#> 20 D       0.5  6.23 0.0257  0.137

Created on 2023-08-24 with reprex v2.0.2

1 Like

Gah, that is SO close to working. I got this error:

Error: Problem with `mutate()` input `kd`.
x subscript out of bounds

Do you know what this means?

Please show the actual code you ran. Did you use your example data set as I did?

I was not using the example data, and had some groups with only two rows, which obviously doesn't work for a linear model. I removed the entries with only two rows and it worked. Thank you!!!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.