Can you identify via ggplot_build whether a colour/fill scale is continuous or discrete?

davidhodge931 · October 25, 2023, 8:10pm

Hi ggplot2 experts!

I was wondering if it was possible to identify via ggplot_build whether a colour/fill scale is continuous or discrete?

I would like to be able to obtain this information programmatically based on the output of ggplot_build.

Thanks!
David

library(tidyverse)
#> Warning: package 'purrr' was built under R version 4.3.1
#> Warning: package 'dplyr' was built under R version 4.3.1
#> Warning: package 'lubridate' was built under R version 4.3.1
library(palmerpenguins)

p1 <- penguins |> 
  ggplot() +
  geom_point(aes(x = flipper_length_mm, y = body_mass_g, colour = species))

p2 <- penguins |> 
  ggplot() +
  geom_point(aes(x = flipper_length_mm, y = body_mass_g, colour = bill_length_mm))

plot_build1 <- ggplot_build(p1)
plot_data1 <- plot_build1$data[[1]] |> tibble()

plot_build2 <- ggplot_build(p2)
plot_data2 <- plot_build2$data[[1]] |> tibble()

^{Created on 2023-10-26 with reprex v2.0.2}

AlexisW · October 26, 2023, 5:14pm

I'm by no means an expert in ggplot2 internals, but some observations: the build output has 3 components,

> names(plot_build1)
[1] "data"   "layout" "plot"

The most promising one in my opinion is layout, but:

> waldo::compare(plot_build1$layout, plot_build2$layout)
✔ No differences

so there is no hope there.

The plot component contains the original plot, so it does have the information you seek:

> plot_build1$plot$scales$scales[[3]]$aesthetics
[1] "colour"
> plot_build1$plot$scales$scales[[3]]$call
discrete_scale(aesthetics = aesthetics, scale_name = "hue", palette = hue_pal(h, 
    c, l, h.start, direction), na.value = na.value)

> plot_build2$plot$scales$scales[[3]]$aesthetics
[1] "colour"
> plot_build2$plot$scales$scales[[3]]$call
continuous_scale(aesthetics = aesthetics, scale_name = "gradient", 
    palette = seq_gradient_pal(low, high, space), na.value = na.value, 
    guide = guide)

in a way it's cheating, as this is the plot structure rather than the build result, but this part of the structure is indeed computed by ggplot_build() so maybe it counts.

Finally, the first component:

> waldo::compare(plot_build1$data, plot_build2$data)
     old[[1]]$colour | new[[1]]$colour                 
 [1] "#F8766D"       - "#234A6D"       [1]             
 [2] "#F8766D"       - "#244C6F"       [2]             
 [3] "#F8766D"       - "#255074"       [3]             
 [4] "#F8766D"       - "grey50"        [4]             
 [5] "#F8766D"       - "#1D3F5E"       [5]             
 [6] "#F8766D"       - "#234B6E"       [6]             
 [7] "#F8766D"       - "#22496C"       [7]             
 [8] "#F8766D"       - "#234A6D"       [8]             
 [9] "#F8766D"       - "#17344F"       [9]             
[10] "#F8766D"       - "#29587F"       [10]            
 ... ...               ...             and 334 more ...

`attr(old[[1]]$group, 'n')`: 3
`attr(new[[1]]$group, 'n')`: 1

`old[[1]]$group`:  1  1  1  1  1  1  1  1  1  1 and 334 more...
`new[[1]]$group`: -1 -1 -1 -1 -1 -1 -1 -1 -1 -1             ...

So the data does not contain that information explicitly, but you can see that with a discrete scale, the colour column corresponds to the group column, whereas with a continuous scale they are not related. Things could get a bit more complicated if you have several grouping factors:

p3 <- penguins |> 
  ggplot() +
  geom_point(aes(x = flipper_length_mm, y = body_mass_g, colour = species, shape = island))

dat3 <- layer_data(p3) |> tibble()

table(dat3$colour, dat3$group)
            1   2   3   4   5
  #00BA38   0   0   0  68   0
  #619CFF   0   0   0   0 124
  #F8766D  44  56  52   0   0

but that could still be possible to distinguish.

Finally, this does sound a bit like an XY problem, are you sure this is the right approach? What is the context?

scottyd22 · October 26, 2023, 5:20pm

Hi @davidhodge931. Upon inspection, it looks like the group variable can be used to decipher between discrete and continuous color scales. Continuing your example below, I added a third case that matches case two with the exception of bill_length_mm being set to factor() within the ggplot.

p3 <- penguins |> 
  ggplot() +
  geom_point(aes(x = flipper_length_mm, y = body_mass_g, colour = factor(bill_length_mm)))

plot_build3 <- ggplot_build(p3)
plot_data3 <- plot_build3$data[[1]] |> tibble()

As you can see, in discrete cases (#1, #3), group takes on a positive integer value. For the continuous case (#2), group is -1.

count(plot_data1, group) # discrete
#> # A tibble: 3 × 2
#>   group     n
#>   <int> <int>
#> 1     1   152
#> 2     2    68
#> 3     3   124

count(plot_data2, group) # continuous
#> # A tibble: 1 × 2
#>   group     n
#>   <int> <int>
#> 1    -1   344

count(plot_data3, group) # discrete
#> # A tibble: 165 × 2
#>    group     n
#>    <int> <int>
#>  1     1     1
#>  2     2     1
#>  3     3     1
#>  4     4     1
#>  5     5     1
#>  6     6     1
#>  7     7     1
#>  8     8     2
#>  9     9     2
#> 10    10     1
#> # ℹ 155 more rows

Created on 2023-10-26 with reprex v2.0.2

system · November 2, 2023, 5:21pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.