How to reproduce NYTimes beeswarm plots in ggplot

Hi there!

I am trying to reproduce the "Vaccination rates by country income level" (Covid World Vaccination Tracker - The New York Times) using ggplot:

I specifically love the "gravitation" effect. How the points/circles with similar values are glued together without overlapping. Screenshot from 2021-05-14 09-41-50

I've tried geom_point() , geom_jitter(), geom_beeswarm() with different parameters without luck. With enough patience and jittering, you can get somehow close, but not enough.

Is there a geom_point_attract() similar to ggrepel::geom_label_repel() to create glued repelled points to reproduce the NYTimes plot?

See below some experimentation:

#> Loading required package: ggplot2

N = 100
DF = tibble(country = sample(c("Low income", "High income"), N, TRUE),
            value = sample(c(1, 1.01,1.54, 1.56, 1.87, 1.85, 2, 2.01, 2.02), N, TRUE), #runif(n = N, min = 1, max = 2)
            size_dot = round(runif(n = N, min = 1, max = 30), 0))

ggplot(DF, aes(value, country, fill = country )) +
  ggbeeswarm::geom_beeswarm(aes(size = size_dot), shape = 21, alpha = .8, groupOnX = FALSE, priority = "random") +
  theme_minimal() +
  theme(legend.position = "none")

ggplot(DF, aes(value, country, fill = country)) +
  geom_point(aes(size = size_dot), shape = 21, alpha = .8, position = position_dodge2(width = .2)) +
  theme_minimal() +
  theme(legend.position = "none")

ggplot(DF, aes(value, country, fill = country)) +
  geom_point(aes(size = size_dot), shape = 21, alpha = .8, position = position_jitterdodge(jitter.width = .07, jitter.height = .05, seed = 5)) +
  theme_minimal() +
  theme(legend.position = "none")

ggplot(DF, aes(value, country, fill = country)) +
  geom_jitter(aes(size = size_dot), shape = 21, alpha = .8, width = .05, height = .05) +
  theme_minimal() +
  theme(legend.position = "none")

EDIT: solution

OK, it seems this kind of thing is called circle packing (Circular Packing | the R Graph Gallery) and there is a very nice package that helps with it: {packingcircles} (GitHub - mbedward/packcircles: R package for circle packing).

Getting to the NYTimes-type plot is not trivial, so I ended up creating a repo with a few functions to take care of the complexity: GitHub - gorkang/linearpackcircles.

The end result:

Created on 2021-05-14 by the reprex package (v2.0.0)



There is an old issue in ggrepel (Add repel geom for points · Issue #20 · slowkow/ggrepel · GitHub), and an old pull request adding a geom_text_repel (Geom point repel by zachcp · Pull Request #38 · slowkow/ggrepel · GitHub).

But sadly, some parameters (shape, fill, size) are not working. Using the zachcp version (GitHub - zachcp/ggrepel at geom_point_repel) with geom_text_repel:

1 Like

Interesting question! I don't know if this is working without post-editing and moving the points up and down a bit. Maybe one can modify the alignment function that is used in ggbeeswarm. There is the cex - value that describes the allowed distance from the plot but this seems to be calculated from the middle of the point, not its outline.

However, a few points to come a bit closer to the solution:

  1. The original data is more fine-grained, you have more points at exactly the same spot, whereas these points are more spread along the x-axis. Naturally they are easier to distribute.
    I'd use fewer points with a log-normal distribution to have less of the higher points:
N = 50
DF = tibble(country = sample(c("Low income", "High income"), N, TRUE),
            value = runif(n = N, min = 1, max = 2),
            size_dot = rlnorm(n = N, 1))
  1. You may have noticed the size differences aren't very big in your plot regardless of the inputs. Actually the maximum point size is set by default to 6. You can adjust this with scale_size, e.g: from 1-20 with scale_size(range = c(1,20))
  2. There is the cex argument to adjust the distance allowed, playing a bit with this (in relation to the point size) allows to define the degree of overlap. This works quite okay.
ggplot(DF, aes(value, country, fill = country, size = size_dot)) +
  scale_size(range = c(1,20)) +
  ggbeeswarm::geom_beeswarm(shape = 21, alpha = .8,
                            cex = 5,                    # "distance" of the points
                            priority = "descending",    # you can play with the sorting as well
                            groupOnX = FALSE, 
                            ) +
  theme_minimal() +
  theme(legend.position = "none")

Thanks a lot Matthias!

This looks much better, and it starts to become reasonable. I would need the solution to work automagically with any dataset, so it would be really great if, as you say, the cex parameter in the ggbeeswarm package was applied from the outline.

Thanks also for the scale_size() tip.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.