I am working with scatterplots, so for example something like this:
#generate random x and y data from 1 to 100
set.seed(10)
x = runif(100,0,10)
y = runif(100,0,10)
m1 = lm(y ~ x)
#plot
plot <- ggplot(m1, aes_string(x = 'x', y = 'y')) +
geom_point(size=3, colour = "blue") +
geom_abline(intercept = 0, slope = 1) #add line with slope of 1
print(plot)
Now I want to get two things out of this:
The number of points both below and above the line respectively and
the distance of each point from the line on the y-axis summarized as one value (so sort of residuals summarized) for all points above and below the line respectively.
y_hat <- x * 1 + 0 #for cases where the slope != 1 or intercept != 0
Resid <- y - y_hat
PosResid <- Resid[Resid >= 0]
NegResid <- Resid[Resid < 0]
#Number of points in each population
length(PosResid)
length(NegResid)
#Sums in each population
sum(PosResid)
sum(NegResid)