Ggbeeswarm plot

I have created this plot:

using this code:


ggplot(data = dats2,
       aes(x = sex, y = age)) +
  theme_bw(base_size = 16) +
  geom_quasirandom(col = "black", varwidth = TRUE, groupOnX = TRUE, alpha = 3/4, size = 2)

I would like to ask how to interpret this in relation to x axis and proximity to midline and 1.5 value ?

Sex variable is categorical: 1 = women, 2 = men
Age variable is numerical.

How to read this plot ?

What is the difference between points I have encirled in red ?

Hard to tell but you are not plotting sex as a categorical variable.

See the difference in the x-axis here


dat1  <- data.frame(sex = as.factor(sample(1:2, 20, replace = TRUE)), yy = rnorm(20))
ggplot(dat1, aes(sex, yy)) + geom_beeswarm()

The points are distributed more or less randomly along the x-axis, they are spread wider when more points are in the range. So the points at 2.25 aren't "more men" than the points at 1.75, everything in the cloud around 2 is male, the other cloud is female.
You could also use categorical labels on x stating "female" and "male" directly.

Thank you both for your kind replies, so can we say that it is another form of scatter plot showing differently distribution of data ?

Kind of. The idea is to avoid overplotting, to show all data-points even though they may overlap.
If you would replace it with geom_point() you would just see a black line between 40 and 60 as many data-points do overlap here. With the beeswarm the overlapping points are moved away along the x-axis. The width represents the degree of overlapping and with this the distribution.

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.