Adding mean and median to a boxplot

Hello, I have made this graph and I would like to add a line with the total mean and the total median to the graph that cross the whole graph.

DATA%>% 
  ggplot(aes(x = CNT, y = PVTOTAL))+
  geom_boxplot()+
  easy_rotate_labels()

Here is a simple example of what I think you want.

DF <- data.frame(Name = rep(c("A","B","C"), 30), Value = rnorm(90))
head(DF)
#>   Name      Value
#> 1    A  0.5969224
#> 2    B  1.1423760
#> 3    C  0.1074961
#> 4    A -1.5284152
#> 5    B -0.7479784
#> 6    C -0.3417590
DF_Stats <- data.frame(Stat = c("Mean","Median"),
                       Stat_Value = c(mean(DF$Value), median(DF$Value)))
DF_Stats
#>     Stat Stat_Value
#> 1   Mean -0.2168492
#> 2 Median -0.3358145
library(ggplot2)
ggplot(DF, aes(Name, Value)) + geom_boxplot() +
  geom_hline(data = DF_Stats, 
             mapping = aes(yintercept = Stat_Value, color = Stat))

Created on 2023-05-19 with reprex v2.0.2

1 Like

Hi @FJCC,
Is it possible to place labels with actual values for mean and median ? This is difficult to figure it out from plot what theirs values are.
Is it possible to add a mean for each group with yellow colour so we do have it apart from grand mean ?

Here is one version of displaying the values of the global mean and median and adding a data point to each boxplot showing the mean.

DF <- data.frame(Name = rep(c("A","B","C"), 30), Value = rnorm(90))
head(DF)
#>   Name       Value
#> 1    A -1.05411137
#> 2    B -0.67697338
#> 3    C -0.80538918
#> 4    A  0.34424211
#> 5    B -0.05592012
#> 6    C -0.41565611
DF_Stats <- data.frame(Stat = c("Mean","Median"),
                       Stat_Value = c(mean(DF$Value), median(DF$Value)))
DF_Stats
#>     Stat Stat_Value
#> 1   Mean -0.1115978
#> 2 Median -0.1412545
library(ggplot2)
ggplot(DF, aes(Name, Value)) + geom_boxplot() +
  geom_hline(data = DF_Stats, 
             mapping = aes(yintercept = Stat_Value, color = Stat)) +
  stat_summary(fun = mean, geom="point") +
  geom_text(aes(x = c(1.5, 2.5), y = 1, label = round(Stat_Value,3), color = Stat),
            data = DF_Stats, show.legend = FALSE)

Created on 2023-05-20 with reprex v2.0.2

Thank you, when I change from geom="point" to geom="line" in stat_summary, nothing is displayed.

Actually the following warning is displayed:
"geom_line(): Each group consists of only one observation.
Do you need to adjust the group aesthetic?"

Why is that ?

Because the x axis is categorical, ggplot assumes grouping by that variable. Since there is only one value in each group, a line cannot be drawn. This can be fixed by including group =1 in the aes().

ggplot(DF, aes(Name, Value)) + geom_boxplot() +
  geom_hline(data = DF_Stats, 
             mapping = aes(yintercept = Stat_Value, color = Stat)) +
  stat_summary(fun = mean, geom="line", mapping = aes(group = 1)) +
  geom_text(aes(x = c(1.5, 2.5), y = 1, label = round(Stat_Value,3), color = Stat),
            data = DF_Stats, show.legend = FALSE)

How come the median which is inside a box (black line) can be a line, but a mean can't and it only works when mean is set to "point" ?
I do not get it that "Each group consists of only one observation" as there are 3 groups (A, B, C) in Name variable.
What am I missing ?

My desired result:

The mean can be a line, it is just more work. You can use geom_linerange. The first category on the x axis is at position 1, the second is at 2, and so on. I used this fact to add an X column to the Means data frame.
There may be a better way to do this and this is very manual.

library(ggplot2)
library(dplyr)
DF <- data.frame(Name = rep(c("A","B","C"), 30), Value = rnorm(90))
head(DF)
DF_Stats <- data.frame(Stat = c("Mean","Median"),
                       Stat_Value = c(mean(DF$Value), median(DF$Value)))
Means <- DF |> group_by(Name) |> summarize(Mean = mean(Value)) |> 
  mutate(X = 1:3)

ggplot(DF, aes(Name, Value)) + geom_boxplot(width = 0.8) +
  geom_hline(data = DF_Stats, 
             mapping = aes(yintercept = Stat_Value, color = Stat)) +
  geom_linerange(data = Means, 
                 mapping = aes(x = Name, y = Mean, xmin = X - 0.4, xmax = X + 0.4),
                 color = "blue", size = 1) +
  geom_text(aes(x = c(1.5, 2.5), y = 1, label = round(Stat_Value,3), color = Stat),
            data = DF_Stats, show.legend = FALSE)

My comment "Each group consists of only one observation" refers to the data after stat_summary has calculated the mean. There is only one mean value at each x position.

Thank you that you patiently explained this to me. Much appreciated and I have learnt geom_linerange() as I did not know about it. Greatly appreciated.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.