ggplot of lda seems inverted

zard2022 · April 24, 2023, 11:58pm

Hello.

I am attempting to create a ggplot2 plot of a linear discriminant analysis of my data. I have done so without issues in the past. However, I notice that the plotted data appears 'inverted'--points that should be below zero on the Y axis/the regression line which I separately and initially plotted as a frame of reference are appearing above it, and vice-versa.

My (modified) code consists of the following.

For the initial plot, to yield an idea of which points will lie above and below the regression line. I include this for completeness, as maybe I made an error with my code here.

Create dataframe

Size<-c(6,6,6,8,8,8,10,10,10,12,12,12,15,15,15,6,6,8,8,8,10,10,10,12,12,12,15,15,15,6,6,6,8,10,10,10,12,12,12,15,15,6,8,8,8,10,10,10,12,12,15,15)

Category<-c("ClassII", "ClassII", "ClassII", "ClassII", "ClassII", "ClassII", "ClassII", "ClassII", "ClassII", "ClassII", "ClassII", "ClassII", "ClassII", "ClassII", "ClassII", "ClassIII", "ClassIII", "ClassIII", "ClassIII", "ClassIII", "ClassIII", "ClassIII", "ClassIII", "ClassIII", "ClassIII", "ClassIII", "ClassIII", "ClassIII", "ClassIII", "ClassI", "ClassI", "ClassI", "ClassI", "ClassI", "ClassI", "ClassI", "ClassI", "ClassI", "ClassI", "ClassI", "ClassI", "ClassIV", "ClassIV", "ClassIV", "ClassIV", "ClassIV", "ClassIV", "ClassIV", "ClassIV", "ClassIV", "ClassIV", "ClassIV")

H<-c(0.4597714,0.3384975,0.2438867,0.5773447,0.5424548,0.5225763,0.5773447,0.5424548,0.5225763,0.6188187,0.5979812,0.5321799,0.6028551,0.4706633,0.4867061,0.3674625,0.3430894,0.3102022,0.4380490,0.4037123,0.3904491,0.3952290,0.3964599,0.5618259,0.5479117,0.6004870,0.5838193,0.5983880,0.5864260,0.6313169,0.5161577,0.5822030,0.6525793,0.4346467,0.4190352,0.4248726,0.5149471,0.5433182,0.4797744,0.5149471,0.5433182,0.3071416,0.3227957,0.5113163,0.5167215,0.3055734,0.2595054,0.2697147,0.1945752,0.1844296,0.4543830,0.4506419)

D<-c(17.060473,17.247823,17.487762,14.783000,13.305876,11.955035,15.569631,16.330392,15.297604,13.801903,13.316480,12.114558,14.744418,16.776991,14.128221,42.428042,40.711409,45.048931,44.613229,34.386670,23.555482,24.578951,22.834340,16.106533,19.230402,18.609950,25.945419,17.957438,24.540131,9.217218,8.346780,8.350304,8.931497,7.871861,7.627603,8.483040,8.952785,7.902581,4.846481,9.441160,9.461342,34.636275,33.427111,36.670034,19.104717,34.539788,44.268683,38.370184,31.623433,33.561326,45.195551,27.661643)

data<-data.frame(Size,Category,H,D)

print(data)

##Create Regression Plot
RegressionPlot<- ggplot(data, aes(x=D, y=H)) + geom_point(aes(x = D, y = H, color = data$Size, shape=data$Category), size = 4) + scale_color_gradient(breaks=c(6, 8, 10, 12, 15),low = "blue1", high = "red1")+xlab("D") +ylab("H")+theme_classic()+theme(legend.position = "none")+ geom_smooth(method='lm', formula= y~x)+ stat_regline_equation(label.x = 30, label.y = .5) + stat_cor(label.x = 30, label.y = .4)
RegressionPlot

For the LDA plot, where I believe the error most likely lies:

varsDH <- cbind(data$H, data$D)
post_hocDH <-lda(data$Category~ varsDH, CV = F)
plot_ldaDHbyCategory <- data.frame(data[, "H"], lda =predict(post_hocDH)$x)
ggplot(plot_ldaDHbyCategory ) + geom_point(aes(x = lda.LD1, y = lda.LD2, color = data$Size, shape=data$Category), size = 4) + theme_classic() + scale_color_gradient(breaks=c(6, 8, 10, 12, 15),low = "blue1", high = "red1")+ xlab("D/H ratio") + ylab("Deviation from regression line")+theme(legend.position = "none")

I would like to know where I may be going wrong and how to rectify this issue of the deviation from 0 in my LDA plot being inverted--points that should negatively deviate appear as positive deviations, and vice versa.

Thank you.

mara · April 25, 2023, 11:25am

Could you please turn this into a self-contained reprex (short for reproducible example)? It will help us help you if we can be sure we're all working with/looking at the same stuff.

install.packages("reprex")

If you've never heard of a reprex before, you might want to start by reading the tidyverse.org help page. The reprex dos and don'ts are also useful.

There's a nice FAQ on how to do a minimal reprex for beginners, below:

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

For pointers specific to the community site, check out the reprex FAQ.

zard2022 · April 25, 2023, 3:39pm

I have updated the original question with a reproducible example. My apologies.

nirgrahamuk · April 26, 2023, 9:24am

I would ask you if you can justify why lda.LD1 would be "D/H ratio " and lda.LD2 "Deviation from regression line"
Aren't they rather simply the two linear discriminants that lda found for you one after the other ?

zard2022 · May 1, 2023, 9:25pm

My understanding is that is correct. However, LD2 should be orthogonal to LD1, and LD1 should be the linear function that yields the maximal separation of groups, which should incorporate D and H. If this is incorrect, may you explain why the generated lda plot should be correct? If LD2 (Y axis) should be orthogonal LD1, values that I would expect to be negative, deviating negatively below a regression line (which could also be generated from this dataset) appear positive, and vice versa. This does not seem correct, but if it is, I would appreciate an explanation.

system · May 22, 2023, 9:26pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.