ggplot generates legends only when you create an aesthetic mapping inside aes
. This is usually done by mapping a data column to an aesthetic, like colour
, shape
, or fill
. ggplot is also set up to work most easily with data in "long" format. In your case, that would mean stacking the dv
and sim
columns and adding an additional column that marks whether a value came from dv
or sim
. Below, we'll do that with the gather
function.
library(tidyverse)
theme_set(theme_classic())
# Fake data
set.seed(2)
dat = data.frame(othercolumn = sample(LETTERS, 100, replace=TRUE),
dv = rnorm(100, 10, 3),
sim = rnorm(100, 11, 2))
# convert data to long format
dat.l = gather(dat, key, value, dv, sim)
Note that we now have the numeric data in a single column called value
and a categorical column called key
that tells us where the data came from.
dat.l[c(1:5,101:105), ]
othercolumn key value
1 E dv 7.485139
2 S dv 16.198904
3 O dv 8.313259
4 E dv 13.827147
5 Y dv 6.857282
101 E sim 12.951781
102 S sim 10.661154
103 O sim 12.444384
104 E sim 9.311163
105 Y sim 13.554587
To plot the data, we set the x
aesthetic to value
(we could have done x=value
, but x
is first by default, so we can just type value
) and the colour
aesthetic to key
inside aes
, which generates a legend. We set custom colors using scale_colour_manual
and we use theme
to set a custom legend position.
ggplot(dat.l, aes(value, colour=key)) +
geom_density() +
labs(colour="Type",
x="Concerta Peak1 Cmax Distribution",
y="Density") +
scale_colour_manual(values=c("blue", "red")) +
theme(legend.position=c(0.9, 0.9))
Instead of creating dat.l
as a separate object, we could have converted the data to long format on the fly:
dat %>%
gather(key, value, dv, sim) %>%
ggplot(aes(value, colour=key)) +
geom_density() +
labs(colour="Type",
x="Concerta Peak1 Cmax Distribution",
y="Density") +
scale_colour_manual(values=c("blue", "red")) +
theme(legend.position=c(0.9, 0.9))
With your original data, to get two density plots, we need two calls to geom_density
. We can also create a legend with artificial "dummy" aesthetics, which are done below with colour="dv"
and colour="sim"
(we could have used any strings instead of "dv" and "sim"). This "works", but requires more work and doesn't maintain a natural mapping between the data and the plot.
ggplot(dat) +
geom_density(aes(dv, colour="dv")) +
geom_density(aes(sim, colour="sim")) +
labs(colour="Type",
x="Concerta Peak1 Cmax Distribution",
y="Density") +
scale_colour_manual(values=c("blue", "red")) +
theme(legend.position=c(0.9, 0.9))
For all of these versions of the code, the plot looks like this: