Slow to Produce Graphs in RStudio

Hi forum,

First posting. I'm using RStudio 1.1.463 with Microsoft R Client 3.4.3.

I use R to fit a lot of GLMs (generalised linear models). When I use the plot() command to produce various diagnostic graphs in RStudio, the plots take a very long time (~20 seconds) to appear in the Plots window in the bottom right corner of my screen.

Is this to be expected? Is there anything I can do to improve this?

Thanks.

Please provide more details about the data and the object you are plotting. How big is the data set and what gets plotted? If there are a great many points on the plot, it could take a while to appear.

1 Like

Thanks. So the glm model object is based on about 200,000 observations, which might explain why the graphs are slow to show up (ie it's not RStudio which is slow, it's R)...? I suppose I might be able to modify my code to just plot an x% (1% perhaps) random sample of the data instead, which might capture the essence of what's going on without taking so much time.

An example of the code I've written (where glm_B is the model object created by glm():

res <- residuals(glm_B, type = "deviance")
predicted_values <- exp(predict(glm_B))
plot(predicted_values, res, ylim = c(-10000, 10000))
abline(h = 0, lty = 2)

Well, the plot() command from this code took 16s (counted in my head - super approximate)

x <- rnorm(200000)
y <- runif(200000, 0, 10)
plot(x, y)

so what you are reporting seems reasonable. I would not worry so much about the time required but about the usefulness of the plot. There is probably so much over plotting that you cannot be sure what is going on. A contour plot might be more useful. That is easily done with ggplot's geom_density2d(). There is also a contour() function in the standard graphics package, though it might require more pre processing of the data.

I suggest you start another thread asking for useful ways to look at 200 000 residual values. I expect people will have good ideas that will give quicker and more informative plots.

1 Like

Thank you.

I've been able to look again at my code and I've nowbeen able to:

  • Produce scatter plots using just approx 5%-10% random samples of the residuals, which I think works equally well visually (less overplotting, as you pointed out) but is a lot faster to arrive
  • Use ggplot2 to make some contour plot overlays, which look rather splendid :grinning:

...so I think this is a pretty good outcome.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.