ggplot of lda seems inverted

Hello.

I am attempting to create a ggplot2 plot of a linear discriminant analysis of my data. I have done so without issues in the past. However, I notice that the plotted data appears 'inverted'--points that should be below zero on the Y axis/the regression line which I separately and initially plotted as a frame of reference are appearing above it, and vice-versa.

My (modified) code consists of the following.

For the initial plot, to yield an idea of which points will lie above and below the regression line. I include this for completeness, as maybe I made an error with my code here.

Create dataframe

Size<-c(6,6,6,8,8,8,10,10,10,12,12,12,15,15,15,6,6,8,8,8,10,10,10,12,12,12,15,15,15,6,6,6,8,10,10,10,12,12,12,15,15,6,8,8,8,10,10,10,12,12,15,15)

Category<-c("ClassII", "ClassII", "ClassII", "ClassII", "ClassII", "ClassII", "ClassII", "ClassII", "ClassII", "ClassII", "ClassII", "ClassII", "ClassII", "ClassII", "ClassII", "ClassIII", "ClassIII", "ClassIII", "ClassIII", "ClassIII", "ClassIII", "ClassIII", "ClassIII", "ClassIII", "ClassIII", "ClassIII", "ClassIII", "ClassIII", "ClassIII", "ClassI", "ClassI", "ClassI", "ClassI", "ClassI", "ClassI", "ClassI", "ClassI", "ClassI", "ClassI", "ClassI", "ClassI", "ClassIV", "ClassIV", "ClassIV", "ClassIV", "ClassIV", "ClassIV", "ClassIV", "ClassIV", "ClassIV", "ClassIV", "ClassIV")

H<-c(0.4597714,0.3384975,0.2438867,0.5773447,0.5424548,0.5225763,0.5773447,0.5424548,0.5225763,0.6188187,0.5979812,0.5321799,0.6028551,0.4706633,0.4867061,0.3674625,0.3430894,0.3102022,0.4380490,0.4037123,0.3904491,0.3952290,0.3964599,0.5618259,0.5479117,0.6004870,0.5838193,0.5983880,0.5864260,0.6313169,0.5161577,0.5822030,0.6525793,0.4346467,0.4190352,0.4248726,0.5149471,0.5433182,0.4797744,0.5149471,0.5433182,0.3071416,0.3227957,0.5113163,0.5167215,0.3055734,0.2595054,0.2697147,0.1945752,0.1844296,0.4543830,0.4506419)

D<-c(17.060473,17.247823,17.487762,14.783000,13.305876,11.955035,15.569631,16.330392,15.297604,13.801903,13.316480,12.114558,14.744418,16.776991,14.128221,42.428042,40.711409,45.048931,44.613229,34.386670,23.555482,24.578951,22.834340,16.106533,19.230402,18.609950,25.945419,17.957438,24.540131,9.217218,8.346780,8.350304,8.931497,7.871861,7.627603,8.483040,8.952785,7.902581,4.846481,9.441160,9.461342,34.636275,33.427111,36.670034,19.104717,34.539788,44.268683,38.370184,31.623433,33.561326,45.195551,27.661643)

data<-data.frame(Size,Category,H,D)

print(data)

##Create Regression Plot
RegressionPlot<- ggplot(data, aes(x=D, y=H)) + geom_point(aes(x = D, y = H, color = data$Size, shape=data$Category), size = 4) + scale_color_gradient(breaks=c(6, 8, 10, 12, 15),low = "blue1", high = "red1")+xlab("D") +ylab("H")+theme_classic()+theme(legend.position = "none")+ geom_smooth(method='lm', formula= y~x)+ stat_regline_equation(label.x = 30, label.y = .5) + stat_cor(label.x = 30, label.y = .4)
RegressionPlot

For the LDA plot, where I believe the error most likely lies:

varsDH <- cbind(data$H, data$D)
post_hocDH <-lda(data$Category~ varsDH, CV = F)
plot_ldaDHbyCategory <- data.frame(data[, "H"], lda =predict(post_hocDH)$x)
ggplot(plot_ldaDHbyCategory ) + geom_point(aes(x = lda.LD1, y = lda.LD2, color = data$Size, shape=data$Category), size = 4) + theme_classic() + scale_color_gradient(breaks=c(6, 8, 10, 12, 15),low = "blue1", high = "red1")+ xlab("D/H ratio") + ylab("Deviation from regression line")+theme(legend.position = "none")

I would like to know where I may be going wrong and how to rectify this issue of the deviation from 0 in my LDA plot being inverted--points that should negatively deviate appear as positive deviations, and vice versa.

Thank you.

Could you please turn this into a self-contained reprex (short for reproducible example)? It will help us help you if we can be sure we're all working with/looking at the same stuff.

install.packages("reprex")

If you've never heard of a reprex before, you might want to start by reading the tidyverse.org help page. The reprex dos and don'ts are also useful.

There's a nice FAQ on how to do a minimal reprex for beginners, below:

For pointers specific to the community site, check out the reprex FAQ.

I have updated the original question with a reproducible example. My apologies.

I would ask you if you can justify why lda.LD1 would be "D/H ratio " and lda.LD2 "Deviation from regression line"
Aren't they rather simply the two linear discriminants that lda found for you one after the other ?

My understanding is that is correct. However, LD2 should be orthogonal to LD1, and LD1 should be the linear function that yields the maximal separation of groups, which should incorporate D and H. If this is incorrect, may you explain why the generated lda plot should be correct? If LD2 (Y axis) should be orthogonal LD1, values that I would expect to be negative, deviating negatively below a regression line (which could also be generated from this dataset) appear positive, and vice versa. This does not seem correct, but if it is, I would appreciate an explanation.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.