regression and plotting regression

amare · May 20, 2021, 12:41pm

I want to do liner regression analysis between response variable(y) and predictor variables(for each independently considering variable pindex as confounding variable) and eventually I want to plot the significant predictor variable estimated regression vs the response variable in a ggplot2.

I know the formula how to do liner regression and I can do it for each one by one( lm(y ~ pindex + predictor, data = surgical)).

For example, for liver and enzyme

# for enzyme 
modle_enzyme  <- lm(y  ~  pindex + enzyme_test)
enzy <- pred <- broom::tidy(modle_enzyme )

# for liver
modle_liver  <- lm(y  ~  pindex + liver_test)  etc for each 
liver <- broom::tidy(modle_liver )

all <- bind_rows(enzy, liver, ...)

But doing this is really tiresome for large data set.
So I was wondering if some one can show me how to do it using loop function.
The data can be found here

library(olsrr)
data( surgical)

Thank you!

mattwarkentin · May 20, 2021, 3:12pm

Hi @amare,

Here is how I might go about it. I would pivot the data to long format, nest each predictor into its own data set (along with y and pindex), and then you can easily fit all of your models, tidy them with broom, and then unnest them to have all the coefficients together in one data set.

library(olsrr)
library(tidyverse)

data(surgical)

surgical %>% 
  pivot_longer(
    c(-pindex, -y)
  ) %>% 
  group_nest(name) %>%
  mutate(
    model = map(data, ~lm(y ~ pindex + value, .x)),
    tidy = map(model, broom::tidy)
  ) %>% 
  unnest(tidy)
#> # A tibble: 21 x 8
#>    name                data model  term     estimate std.error statistic p.value
#>    <chr>    <list<tibble[,> <list> <chr>       <dbl>     <dbl>     <dbl>   <dbl>
#>  1 age             [54 × 3] <lm>   (Interc…   267.      310.       0.861 3.93e-1
#>  2 age             [54 × 3] <lm>   pindex       9.77      2.97     3.29  1.83e-3
#>  3 age             [54 × 3] <lm>   value       -3.55      4.52    -0.786 4.36e-1
#>  4 alc_hea…        [54 × 3] <lm>   (Interc…   -65.2     173.      -0.377 7.07e-1
#>  5 alc_hea…        [54 × 3] <lm>   pindex      10.8       2.60     4.15  1.25e-4
#>  6 alc_hea…        [54 × 3] <lm>   value      461.      112.       4.12  1.38e-4
#>  7 alc_mod         [54 × 3] <lm>   (Interc…   133.      191.       0.696 4.90e-1
#>  8 alc_mod         [54 × 3] <lm>   pindex      10.6       2.91     3.64  6.34e-4
#>  9 alc_mod         [54 × 3] <lm>   value     -187.       97.7     -1.92  6.11e-2
#> 10 bcs             [54 × 3] <lm>   (Interc…  -328.      241.      -1.36  1.80e-1
#> # … with 11 more rows

mattwarkentin · May 20, 2021, 3:16pm

Here is a more concise way actually:

library(olsrr)
library(tidyverse)

data(surgical)

surgical %>% 
  pivot_longer(
    cols = c(-pindex, -y),
    names_to = "predictor"
  ) %>% 
  group_by(predictor) %>% 
  group_modify(
    ~broom::tidy(lm(y ~ pindex + value, .x))
  )
#> # A tibble: 21 x 6
#> # Groups:   predictor [7]
#>    predictor term        estimate std.error statistic  p.value
#>    <chr>     <chr>          <dbl>     <dbl>     <dbl>    <dbl>
#>  1 age       (Intercept)   267.      310.       0.861 0.393   
#>  2 age       pindex          9.77      2.97     3.29  0.00183 
#>  3 age       value          -3.55      4.52    -0.786 0.436   
#>  4 alc_heavy (Intercept)   -65.2     173.      -0.377 0.707   
#>  5 alc_heavy pindex         10.8       2.60     4.15  0.000125
#>  6 alc_heavy value         461.      112.       4.12  0.000138
#>  7 alc_mod   (Intercept)   133.      191.       0.696 0.490   
#>  8 alc_mod   pindex         10.6       2.91     3.64  0.000634
#>  9 alc_mod   value        -187.       97.7     -1.92  0.0611  
#> 10 bcs       (Intercept)  -328.      241.      -1.36  0.180   
#> # … with 11 more rows

mhakanda · May 20, 2021, 3:45pm

@mattwarkentin solution is really nice and concise. I would follow his way. But, if you like to do a bit more primitive way using for loop, you can try as follows.

pre1<-setdiff(names(surgical), c("y", "pindex"))
mod_dres<-NULL
for (j in pre1) {
  model  <- lm(y  ~  pindex + get(j), data = surgical)
  bmodel <- pred <- broom::tidy(model)
  bmodel$term[3]<-j
  mod_dres<-rbind(mod_dres,bmodel)
}

mod_dres
#> # A tibble: 21 x 5
#>    term        estimate std.error statistic     p.value
#>    <chr>          <dbl>     <dbl>     <dbl>       <dbl>
#>  1 (Intercept)  -328.      241.      -1.36  0.180      
#>  2 pindex          9.23      2.82     3.27  0.00191    
#>  3 bcs            77.1      29.7      2.60  0.0123     
#>  4 (Intercept)  -792.      206.      -3.84  0.000341   
#>  5 pindex         10.2       2.27     4.49  0.0000410  
#>  6 enzyme_test    11.0       1.81     6.08  0.000000152
#>  7 (Intercept)  -206.      162.      -1.27  0.211      
#>  8 pindex          4.67      2.53     1.84  0.0709     
#>  9 liver_test    223.       40.0      5.58  0.000000935
#> 10 (Intercept)   267.      310.       0.861 0.393      
#> # … with 11 more rows

amare · May 20, 2021, 4:33pm

Hi mhakanda,

Thank you so much! Your method also works but how to exclude the intercept and pindex?
Best,
Amare

mhakanda · May 20, 2021, 4:54pm

If you do not need pindex and intercept info, we can do

pre1<-setdiff(names(surgical), c("y", "pindex"))
mod_dres<-NULL
for (j in pre1) {
  model  <- lm(y  ~  pindex + get(j), data = surgical)
  bmodel <- broom::tidy(model)
  bmodel$term[3]<-j
  bmodel<-bmodel[3,]
  mod_dres<-rbind(mod_dres,bmodel)
}

mod_dres
#> # A tibble: 7 x 5
#>   term        estimate std.error statistic     p.value
#>   <chr>          <dbl>     <dbl>     <dbl>       <dbl>
#> 1 bcs            77.1      29.7      2.60  0.0123     
#> 2 enzyme_test    11.0       1.81     6.08  0.000000152
#> 3 liver_test    223.       40.0      5.58  0.000000935
#> 4 age            -3.55      4.52    -0.786 0.436      
#> 5 gender         97.2     100.       0.971 0.336      
#> 6 alc_mod      -187.       97.7     -1.92  0.0611     
#> 7 alc_heavy     461.      112.       4.12  0.000138

system · May 27, 2021, 4:55pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.