Replacement for dotwhisker package for tidymodels

WayneBovey · May 18, 2024, 2:40am

I recently upgraded R to the latest version (4.4.0) and discovered that the dotwhisker package which is used to visualise tidymodels (lm engine) in publications (such as tidymodels - Build a model) is no longer supported or available on CRAN.
Rather than using a superseded package, what other options are available including the code to visualise a dot and whisker plot compatible with tidymodels workflow lm engine output? Can ggplot or sjPlot packages be adapted to tidymodels workflow output to produce a dot and whisker plot to replace the now superseded dotwhisker package?

dromano · May 21, 2024, 8:26pm

Hi @WayneBovey ,

From your link, it should be relatively simple to render the same plot with ggplot since the output of tidy() is a table. I can't produce the code now, but maybe someone else can in the meantime, and if not. I'll be sure to supply an example.

dromano · May 21, 2024, 8:49pm

Hi again,

Here's a rough reproduction of the plot you linked, which maybe someone can easily modify to match the original. (Full reprex at end of post.)

import "urchins" table

library(tidyverse)
urchins <-
  # Data were assembled for a tutorial 
  # at https://www.flutterbys.com.au/stats/tut/tut7.5a.html
  read_csv("https://tidymodels.org/start/models/urchins.csv") |> 
  # Change the names to be a little more verbose
  setNames(c("food_regime", "initial_volume", "width")) |> 
  # Factors are very helpful for modeling, so we convert one column
  mutate(food_regime = factor(food_regime, levels = c("Initial", "Low", "High")))

First, here's the output of the tidy() function

library(tidymodels)
linear_reg() |> 
  fit(width ~ initial_volume * food_regime, data = urchins) |> 
  tidy()
#> # A tibble: 6 × 5
#>   term                            estimate std.error statistic  p.value
#>   <chr>                              <dbl>     <dbl>     <dbl>    <dbl>
#> 1 (Intercept)                     0.0331    0.00962      3.44  0.00100 
#> 2 initial_volume                  0.00155   0.000398     3.91  0.000222
#> 3 food_regimeLow                  0.0198    0.0130       1.52  0.133   
#> 4 food_regimeHigh                 0.0214    0.0145       1.47  0.145   
#> 5 initial_volume:food_regimeLow  -0.00126   0.000510    -2.47  0.0162  
#> 6 initial_volume:food_regimeHigh  0.000525  0.000702     0.748 0.457

and here's the plot:

linear_reg() |> 
  fit(width ~ initial_volume * food_regime, data = urchins) |> 
  tidy() |> 
  # remove intercept row
  slice(-1) |> 
  # have term order match reverse of row order
  mutate(term = fct_reorder(term, row_number(), .desc = T )) |> 
  ggplot(aes(estimate, term)) +
  geom_point() +
  geom_segment(
    aes(
      x = estimate - std.error,
      xend = estimate + std.error,
      yend = term
    )) +
  geom_vline(xintercept = 0, colour = "grey50", linetype = 'dashed')

^{Created on 2024-05-21 with reprex v2.0.2}

Full reprex

library(tidyverse)
urchins <-
  # Data were assembled for a tutorial 
  # at https://www.flutterbys.com.au/stats/tut/tut7.5a.html
  read_csv("https://tidymodels.org/start/models/urchins.csv") |> 
  # Change the names to be a little more verbose
  setNames(c("food_regime", "initial_volume", "width")) |> 
  # Factors are very helpful for modeling, so we convert one column
  mutate(food_regime = factor(food_regime, levels = c("Initial", "Low", "High")))

library(tidymodels)
linear_reg() |> 
  fit(width ~ initial_volume * food_regime, data = urchins) |> 
  tidy()
#> # A tibble: 6 × 5
#>   term                            estimate std.error statistic  p.value
#>   <chr>                              <dbl>     <dbl>     <dbl>    <dbl>
#> 1 (Intercept)                     0.0331    0.00962      3.44  0.00100 
#> 2 initial_volume                  0.00155   0.000398     3.91  0.000222
#> 3 food_regimeLow                  0.0198    0.0130       1.52  0.133   
#> 4 food_regimeHigh                 0.0214    0.0145       1.47  0.145   
#> 5 initial_volume:food_regimeLow  -0.00126   0.000510    -2.47  0.0162  
#> 6 initial_volume:food_regimeHigh  0.000525  0.000702     0.748 0.457

linear_reg() |> 
  fit(width ~ initial_volume * food_regime, data = urchins) |> 
  tidy() |> 
  # remove intercept row
  slice(-1) |> 
  # have term order match reverse of row order
  mutate(term = fct_reorder(term, row_number(), .desc = T )) |> 
  ggplot(aes(estimate, term)) +
  geom_point() +
  geom_segment(
    aes(
      x = estimate - std.error,
      xend = estimate + std.error,
      yend = term
    )) +
  geom_vline(xintercept = 0, colour = "grey50", linetype = 'dashed')

^{Created on 2024-05-21 with reprex v2.0.2}

WayneBovey · May 22, 2024, 4:52am

Hello David, thank you for replying with a solution. I found that by adding confidence levels “conf.int = TRUE, conf.level = 0.95” to the tidy() function and with a small change to your code to reflect this (i.e. conf.low and conf.high), it is possible to replicate the result in “tidymodels – Build a model” publication. Thanks again for your assistance David. Regards Wayne

system · May 29, 2024, 4:53am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.