Putting two lm+pred datasets into one ggplot

Trying to figure out how to combine 2 graphs into 1 plot.
My data set is something like:

I want to plot a regression line (using lm) plus the prediction interval (using predict) of two subsets of my data.
Say I want label==A and label==B. These subsets (A and B) do not have the same number of rows.
So the plot show show two regression lines each with their own prediction interval.
Doing this for the confidence interval is easy (se=TRUE within geom_smooth), but the prediction interval needs to be calculated separately.
Something like:

subsetmydata <- subset(mydata, label=="A" | label=="B")
fit <- lm(y ~ x, data=subsetmydata)
prediction_fit <- predict(fit, interval="prediction", level=0.95)
combined_analysis <- data.frame(subsetmydata, prediction_fit)

ggplot(combined_analysis, aes(x=x, y=y, linetype=label)+.....

This however, only calculates the prediction interval of the whole subset and not the two subsets.

Is there a way to do the and calculations separately and then combining them into one dataset and use that in ggplot?


It always helps if you have a minimal reproducible example with data as we can then take your code directly and work with it.

So within ggplot you can stack combinations of lines and groups. You will see in this example geom_jitter is just to setup the points on the plot while we have 2 x LM being plotted. One for the overall which would be geom_smooth(colour = "black", method = "lm", se = FALSE) + and then geom_smooth(aes(colour= group), method="lm", se=FALSE) + which will be 2 lines, one for each group. As long as you have a column showing group A or B you can add that variable where I specified group and it will draw it.

ggplot(df, aes(x= var1, y= var2)) + 
  geom_jitter(aes(colour = group)) + 
  geom_smooth(aes(colour= group), method="lm", se=FALSE) + 
  geom_smooth(colour = "black", method = "lm", se = FALSE) +

Does this help?

I guess I'm not too well versed in R yet to fully understand. I also thought about using the parameter, but I can't seem to find a way to combine my two subsets with each a predict interval.
Let me try to give you some more info.

First an example of my data:

Actually, there are more labels, but I thought I'd keep it to two for the example.
This is what I do now:

  1. Using Shiny with two radio button groups of the to select 1 or 2 diagnosis

  2. When you select diagnosis1 or diagnosis2, I move that subset of data (so just containing that diagnosis) in dataset1 or dataset2. SO: then I have two data sets ready

  3. Then the regression: fit1 <- lm(subset1$nfl ~ subset1$age)

  4. Then the prediciton: pred1 <- predict(fit1, interval="prediction", level=0.95)

  5. Then combine these: plotdata1 <- data.frame(subset1, pred1)

  6. Then I repeat this for the second data set.

  7. Now I use ggplot:

    p <- ggplot() +
    geom_ribbon(data = plotdata, aes(x=subset1$age,y=subset1$nfl,ymin=lwr, ymax=upr), fill="f0f0f0")+
    geom_smooth(data = plotdata1, aes(x=subset1$age,y=subset1$nfl), method=lm, se=FALSE) +

    geom_ribbon(data = plotdata2, aes(x=subset2$age,y=subset2$nfl,ymin=lwr, ymax=upr), fill="ff0000")+
    geom_smooth(data = plotdata2, aes(x=subset2$age,y=subset2$nfl), method=lm, se=FALSE) +


Now this works. However, I can't seem to control the legend now. Nothing seems to be able to override the labels in the legend.
Or perhaps better would be to combine the two plotdata sets (1 and 2) into 1 set and use something like <aes(group=diagnosis)> so ggplot can create both plots by itself - would be easier to ass more data sets if needed too!

So I guess my main question would be: how do I combine both lm+predict results into one single plot so I can use something like <aes(group=diagnosis)> to create both lines and prediction intervals.
Or if I stick to the two lines plot I've got working now: how to manage the legend?

Sorry for the messy explanation!

Thanks for all your help!


something like this ?

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.