ggplot of lda seems inverted


I am attempting to create a ggplot2 plot of a linear discriminant analysis of my data. I have done so without issues in the past. However, I notice that the plotted data appears 'inverted'--points that should be below zero on the Y axis/the regression line which I separately and initially plotted as a frame of reference are appearing above it, and vice-versa.

My (modified) code consists of the following.

For the initial plot, to yield an idea of which points will lie above and below the regression line. I include this for completeness, as maybe I made an error with my code here.

Create dataframe


Category<-c("ClassII", "ClassII", "ClassII", "ClassII", "ClassII", "ClassII", "ClassII", "ClassII", "ClassII", "ClassII", "ClassII", "ClassII", "ClassII", "ClassII", "ClassII", "ClassIII", "ClassIII", "ClassIII", "ClassIII", "ClassIII", "ClassIII", "ClassIII", "ClassIII", "ClassIII", "ClassIII", "ClassIII", "ClassIII", "ClassIII", "ClassIII", "ClassI", "ClassI", "ClassI", "ClassI", "ClassI", "ClassI", "ClassI", "ClassI", "ClassI", "ClassI", "ClassI", "ClassI", "ClassIV", "ClassIV", "ClassIV", "ClassIV", "ClassIV", "ClassIV", "ClassIV", "ClassIV", "ClassIV", "ClassIV", "ClassIV")





##Create Regression Plot
RegressionPlot<- ggplot(data, aes(x=D, y=H)) + geom_point(aes(x = D, y = H, color = data$Size, shape=data$Category), size = 4) + scale_color_gradient(breaks=c(6, 8, 10, 12, 15),low = "blue1", high = "red1")+xlab("D") +ylab("H")+theme_classic()+theme(legend.position = "none")+ geom_smooth(method='lm', formula= y~x)+ stat_regline_equation(label.x = 30, label.y = .5) + stat_cor(label.x = 30, label.y = .4)

For the LDA plot, where I believe the error most likely lies:

varsDH <- cbind(data$H, data$D)
post_hocDH <-lda(data$Category~ varsDH, CV = F)
plot_ldaDHbyCategory <- data.frame(data[, "H"], lda =predict(post_hocDH)$x)
ggplot(plot_ldaDHbyCategory ) + geom_point(aes(x = lda.LD1, y = lda.LD2, color = data$Size, shape=data$Category), size = 4) + theme_classic() + scale_color_gradient(breaks=c(6, 8, 10, 12, 15),low = "blue1", high = "red1")+ xlab("D/H ratio") + ylab("Deviation from regression line")+theme(legend.position = "none")

I would like to know where I may be going wrong and how to rectify this issue of the deviation from 0 in my LDA plot being inverted--points that should negatively deviate appear as positive deviations, and vice versa.

Thank you.

Could you please turn this into a self-contained reprex (short for reproducible example)? It will help us help you if we can be sure we're all working with/looking at the same stuff.


If you've never heard of a reprex before, you might want to start by reading the help page. The reprex dos and don'ts are also useful.

There's a nice FAQ on how to do a minimal reprex for beginners, below:

For pointers specific to the community site, check out the reprex FAQ.

I have updated the original question with a reproducible example. My apologies.

I would ask you if you can justify why lda.LD1 would be "D/H ratio " and lda.LD2 "Deviation from regression line"
Aren't they rather simply the two linear discriminants that lda found for you one after the other ?

My understanding is that is correct. However, LD2 should be orthogonal to LD1, and LD1 should be the linear function that yields the maximal separation of groups, which should incorporate D and H. If this is incorrect, may you explain why the generated lda plot should be correct? If LD2 (Y axis) should be orthogonal LD1, values that I would expect to be negative, deviating negatively below a regression line (which could also be generated from this dataset) appear positive, and vice versa. This does not seem correct, but if it is, I would appreciate an explanation.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.