regression and plotting regression

I want to do liner regression analysis between response variable(y) and predictor variables(for each independently considering variable pindex as confounding variable) and eventually I want to plot the significant predictor variable estimated regression vs the response variable in a ggplot2.

I know the formula how to do liner regression and I can do it for each one by one( lm(y ~ pindex + predictor, data = surgical)).

For example, for liver and enzyme

# for enzyme 
modle_enzyme  <- lm(y  ~  pindex + enzyme_test)
enzy <- pred <- broom::tidy(modle_enzyme )

# for liver
modle_liver  <- lm(y  ~  pindex + liver_test)  etc for each 
liver <- broom::tidy(modle_liver )

all <- bind_rows(enzy, liver, ...)

But doing this is really tiresome for large data set.
So I was wondering if some one can show me how to do it using loop function.
The data can be found here

library(olsrr)
data( surgical)

Thank you!

Hi @amare,

Here is how I might go about it. I would pivot the data to long format, nest each predictor into its own data set (along with y and pindex), and then you can easily fit all of your models, tidy them with broom, and then unnest them to have all the coefficients together in one data set.

library(olsrr)
library(tidyverse)

data(surgical)

surgical %>% 
  pivot_longer(
    c(-pindex, -y)
  ) %>% 
  group_nest(name) %>%
  mutate(
    model = map(data, ~lm(y ~ pindex + value, .x)),
    tidy = map(model, broom::tidy)
  ) %>% 
  unnest(tidy)
#> # A tibble: 21 x 8
#>    name                data model  term     estimate std.error statistic p.value
#>    <chr>    <list<tibble[,> <list> <chr>       <dbl>     <dbl>     <dbl>   <dbl>
#>  1 age             [54 × 3] <lm>   (Interc…   267.      310.       0.861 3.93e-1
#>  2 age             [54 × 3] <lm>   pindex       9.77      2.97     3.29  1.83e-3
#>  3 age             [54 × 3] <lm>   value       -3.55      4.52    -0.786 4.36e-1
#>  4 alc_hea…        [54 × 3] <lm>   (Interc…   -65.2     173.      -0.377 7.07e-1
#>  5 alc_hea…        [54 × 3] <lm>   pindex      10.8       2.60     4.15  1.25e-4
#>  6 alc_hea…        [54 × 3] <lm>   value      461.      112.       4.12  1.38e-4
#>  7 alc_mod         [54 × 3] <lm>   (Interc…   133.      191.       0.696 4.90e-1
#>  8 alc_mod         [54 × 3] <lm>   pindex      10.6       2.91     3.64  6.34e-4
#>  9 alc_mod         [54 × 3] <lm>   value     -187.       97.7     -1.92  6.11e-2
#> 10 bcs             [54 × 3] <lm>   (Interc…  -328.      241.      -1.36  1.80e-1
#> # … with 11 more rows

Here is a more concise way actually:

library(olsrr)
library(tidyverse)

data(surgical)

surgical %>% 
  pivot_longer(
    cols = c(-pindex, -y),
    names_to = "predictor"
  ) %>% 
  group_by(predictor) %>% 
  group_modify(
    ~broom::tidy(lm(y ~ pindex + value, .x))
  )
#> # A tibble: 21 x 6
#> # Groups:   predictor [7]
#>    predictor term        estimate std.error statistic  p.value
#>    <chr>     <chr>          <dbl>     <dbl>     <dbl>    <dbl>
#>  1 age       (Intercept)   267.      310.       0.861 0.393   
#>  2 age       pindex          9.77      2.97     3.29  0.00183 
#>  3 age       value          -3.55      4.52    -0.786 0.436   
#>  4 alc_heavy (Intercept)   -65.2     173.      -0.377 0.707   
#>  5 alc_heavy pindex         10.8       2.60     4.15  0.000125
#>  6 alc_heavy value         461.      112.       4.12  0.000138
#>  7 alc_mod   (Intercept)   133.      191.       0.696 0.490   
#>  8 alc_mod   pindex         10.6       2.91     3.64  0.000634
#>  9 alc_mod   value        -187.       97.7     -1.92  0.0611  
#> 10 bcs       (Intercept)  -328.      241.      -1.36  0.180   
#> # … with 11 more rows
1 Like

@mattwarkentin solution is really nice and concise. I would follow his way. But, if you like to do a bit more primitive way using for loop, you can try as follows.

pre1<-setdiff(names(surgical), c("y", "pindex"))
mod_dres<-NULL
for (j in pre1) {
  model  <- lm(y  ~  pindex + get(j), data = surgical)
  bmodel <- pred <- broom::tidy(model)
  bmodel$term[3]<-j
  mod_dres<-rbind(mod_dres,bmodel)
}

mod_dres
#> # A tibble: 21 x 5
#>    term        estimate std.error statistic     p.value
#>    <chr>          <dbl>     <dbl>     <dbl>       <dbl>
#>  1 (Intercept)  -328.      241.      -1.36  0.180      
#>  2 pindex          9.23      2.82     3.27  0.00191    
#>  3 bcs            77.1      29.7      2.60  0.0123     
#>  4 (Intercept)  -792.      206.      -3.84  0.000341   
#>  5 pindex         10.2       2.27     4.49  0.0000410  
#>  6 enzyme_test    11.0       1.81     6.08  0.000000152
#>  7 (Intercept)  -206.      162.      -1.27  0.211      
#>  8 pindex          4.67      2.53     1.84  0.0709     
#>  9 liver_test    223.       40.0      5.58  0.000000935
#> 10 (Intercept)   267.      310.       0.861 0.393      
#> # … with 11 more rows
1 Like

Hi mhakanda,

Thank you so much! Your method also works but how to exclude the intercept and pindex?
Best,
Amare

If you do not need pindex and intercept info, we can do

pre1<-setdiff(names(surgical), c("y", "pindex"))
mod_dres<-NULL
for (j in pre1) {
  model  <- lm(y  ~  pindex + get(j), data = surgical)
  bmodel <- broom::tidy(model)
  bmodel$term[3]<-j
  bmodel<-bmodel[3,]
  mod_dres<-rbind(mod_dres,bmodel)
}

mod_dres
#> # A tibble: 7 x 5
#>   term        estimate std.error statistic     p.value
#>   <chr>          <dbl>     <dbl>     <dbl>       <dbl>
#> 1 bcs            77.1      29.7      2.60  0.0123     
#> 2 enzyme_test    11.0       1.81     6.08  0.000000152
#> 3 liver_test    223.       40.0      5.58  0.000000935
#> 4 age            -3.55      4.52    -0.786 0.436      
#> 5 gender         97.2     100.       0.971 0.336      
#> 6 alc_mod      -187.       97.7     -1.92  0.0611     
#> 7 alc_heavy     461.      112.       4.12  0.000138
1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.