I have been struggling with two persistent problems in a multi-panel boxplot figure. I would really appreciate any guidance.
FIGURE LAYOUT:
25-cell grid (5 columns × 5 rows)
Each cell = one boxplot panel
Cell 5 (top right) = custom legend
X-axis: 3 groups on x-axis, 6 color/shape combinations per group (dodged)
X-axis labels only on bottom row
Y-axis labels only — no panel titles
PROBLEM 1 — SIGNIFICANCE LETTERS NOT APPEARING
I want two types of letters:
UPPERCASE (A, B) for main effect from two-way ANOVA — only when p < 0.05
lowercase (a, b) for pairwise comparison within each x-group — only when significant
What I tried:
multcomp::cld() — crashes silently because one factor has only 2 levels
emmeans + multcomp — requires multcompView which caused install errors
TukeyHSD() + manual ifelse() — letters calculate correctly in console
but do not appear on the figure because the join between the
stats table and the ymax position table fails silently
What is the most reliable way to calculate and place significance letters
on dodged boxplots when one factor has only 2 levels?
PROBLEM 2 — LEGEND TOO LARGE
I am building a custom legend inside one grid cell using ggplot + geom_point
geom_text + theme_void. The legend is always too large and takes up
too much space relative to the data panels.
It is extremely hard to give fixing suggestions when you dont provide any samplecode for this. Could you please attach a reproducible code, especially with a sample dataset, so that we can run it and more easily identify the problem? Furthermore we can then see, which package you use for making the wrapped plots (patchwork or simply facet_grid() from ggplot?) and how you extract the results of your ANOVA.
However, I have a suggestion for you to increase the power of your plots, especially as it seems that you use your ANOVA and the plot in an academic context. I really like the package ggstatsplot, it creates beautiful publication ready plots:. In my example code I used the nice sample dataset penguin.
library(ggstatsplot) # install it first if needed
grouped_ggbetweenstats(penguins, x = species, y = bill_len, grouping.var = sex, pairwise.display = "significant", type = "parametric")
This gives us formatted code that we can copy, paste and run . Often a person here does not have the time to type out code to test it and find a problem.
A handy way to supply data is to use the dput() function. Do dput(mydata) where "mydata" is the name of your dataset. For really large datasets probably dput(head(mydata, 100) will do. Paste it here between