Predicted value from lm not on the regression line in ggplot2

So I'm doing a very simple scatter plot of a data set and impose a linear trendline to it. Then I highlight a particular observation as well as the predicted value from a linear model, but for whatever reason the predicted value is NOT on the trendline that ggplot2 creates. Not sure what I'm doing wrong here. A reprex is below. Any thoughts would be greatly appreciated. Thanks!

library(tidyverse)
library(modelr)
library(ggrepel)
filter_data <- mtcars %>% arrange(wt) %>% rownames_to_column() %>% rename(model = rowname) %>% 
  add_predictions(lm(mpg~wt, data=.)) # add predicted values from a linear model to the dataframe
corona <- filter_data %>% filter(model =="Toyota Corona") # observation of interest

# highlight a data point
example<- filter_data %>%
  ggplot()+
  geom_point(aes(x = wt, mpg)) +
  geom_smooth(mapping = aes(x = wt, mpg), method = lm, se = FALSE) +
  geom_point(data=corona,aes(x = wt, mpg),color="red")+ # the true data point in red
  geom_point(data=corona,aes(x = wt, y=pred),color="purple")+ # the predicted value in purple
  xlim(2.0,3.0)+ylim(20,25)

# looking at the resulting figure, the purple point is NOT on the line. What's happening?

from the ggplot2 documentation on xlim,ylim and limits

By default, any values outside the limits specified are replaced with NA . Be warned that this will remove data outside the limits and this can produce unintended results. For changing x or y axis limits without dropping data observations, see coord_cartesian() .

so by applying your xlim and ylim you altered the inline geom_smooth of your ggplot, try coord_cartesian instead.

1 Like

Thank you so much! That fixes it though I'm not entirely sure why xlim and ylim caused the issue in the first place.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.