# Ggplot() makes residual plots?!

#1

I discovered this when a student did something I thought was a mistake:

``````library(ggplot2)
ggplot(lm(Sepal.Length~Sepal.Width, data=iris)) +
geom_point(aes(x=.fitted, y=.resid))
``````

When did this magic happen, and is there any documentation about how to work with it? Is it just converting the `lm` object to a `data.frame` with `broom::augment()`?

My students make residual plots of everything, so an easy way of doing this with ggplot2 would be great.

#2

From what I can see ggplot2 identifies the input as a `lm` class, which then performs the fortify.lm function who extracts: with source code here

.hat
Diagonal of the hat matrix

.sigma
Estimate of residual standard deviation when corresponding observation is dropped from model

.cooksd
Cooks distance, cooks.distance

.fitted
Fitted values of model

.resid
Residuals

.stdresid
Standardised residuals

As mentioned here it is adviced to use the `broom` package, which also have support for more models, as `fortify` may be deprecated in the future.

#3

Simon Jackson (@drsimonj on twitter) has a great post on plotting residuals in R, including with ggplot here

#4

Yeah, I teach my students to use broom on the models and then make the plots with the resulting data.frame. But I've been trying to find some shortcuts because it gets old copying and modifying the 20 or so lines of code needed to replicate what `plot.lm()` does with 6 characters.

Yes, DRY, so I should make a function, and I have, but it's not working very well.

"From what I can see ..." -- where? is this done inside ggplot()? I looked for a ggplot.lm() method but ...

#5

if you don't mind sharing the function we could take a look at it and see if we can make it better

I was a little brief in my last response so let me try to clarify a little deeper:

We can see the inner workings of `ggplot()` by typing `ggplot2:::ggplot` in the console (or pressing F2 with the cursor on the function) which gives us the following:

``````function (data = NULL, mapping = aes(), ..., environment = parent.frame())
{
UseMethod("ggplot")
}
<environment: namespace:ggplot2>
``````

`UseMethod("ggplot")` is telling you that `ggplot()` is a (S3) generic function that has methods for different object classes. So we can list all the methods of `ggplot()` with the `methods()` function.

``````> methods(ggplot)
[1] ggplot.data.frame* ggplot.default*
see '?methods' for accessing help and source code
``````

which tells us that there are currently two methods for the `ggplot` function. `UseMethod` will use the class of the input to figure out which method to use.
In our case was the output of an `lm` call which only have 1 class, namely "lm":

``````class(lm(Sepal.Length ~ Sepal.Width, data = iris))
[1] "lm"
``````

`ggplot.lm` does not exist in the available methods as you correctly have noticed which leads `UseMethod` to fallback to look for a default method. That is, it looks for `ggplot.default`. Which it finds and calls. And if we look at the source code for `ggplot.default` we get the following

``````> ggplot2:::ggplot.default
function (data = NULL, mapping = aes(), ..., environment = parent.frame())
{
ggplot.data.frame(fortify(data, ...), mapping, environment = environment)
}
<environment: namespace:ggplot2>
``````

where we can see that the `data` is fed into the `fortify` function which itself is an S3 generic function

``````ggplot2:::fortify
function (model, data, ...)
UseMethod("fortify")
<environment: namespace:ggplot2>
``````

with the following methods

``````methods(fortify)
[1] fortify.cld*
[2] fortify.confint.glht*
[3] fortify.data.frame*
[4] fortify.default*
[5] fortify.function*
[6] fortify.glht*
[7] fortify.Line*
[8] fortify.Lines*
[9] fortify.lm*
[10] fortify.map*
[11] fortify.NULL*
[12] fortify.Polygon*
[13] fortify.Polygons*
[14] fortify.SpatialLinesDataFrame*
[15] fortify.SpatialPolygons*
[16] fortify.SpatialPolygonsDataFrame*
[17] fortify.summary.glht*
see '?methods' for accessing help and source code
``````

where we locate `fortify.lm` which I refered to in my last response but for completeness type out again:

``````ggplot2:::fortify.lm
function (model, data = model\$model, ...)
{
infl <- stats::influence(model, do.coef = FALSE)
data\$.hat <- infl\$hat
data\$.sigma <- infl\$sigma
data\$.cooksd <- stats::cooks.distance(model, infl)
data\$.fitted <- stats::predict(model)
data\$.resid <- stats::resid(model)
data\$.stdresid <- stats::rstandard(model, infl)
data
}
<environment: namespace:ggplot2>
``````

which extracts the necessary information, and feds that into the `ggplot.data.frame` method as it have the correct structure. Hope this was helpful

#6

Thanks! Yes, I always forget to check the default method when I don't see the specific one.

My function:

The main thing I don't like is that it doesn't work with mgcv::gam objects, and it should probably do different things for glm objects.

#7

the `geom_segment` idea is a really cool one!