Alternative to geom_point() when there are too many data points

Hello experts. I'm looking for a ggplot2 plotting function (i.e., geom_something) that can clearly show the relationship between two variables when there are so many data points that geom_point() isn't a good option due to extensive point overlap. Specifically, I'm looking for a ggplot2 function to create this type of plot.

image
Is there one? If not, any advice will be very welcome!

There are a few possibilities depending on your data:

2 Likes

Thank you! I checked these functions. Unfortunately it seems they don't have any option for "smoothing out" the borders of rectangles/hexagons. As a result, plots created by these functions show a mosaic pattern with many "empty" hexagons/rectangles, which make them aesthetically not good. Is it not possible to create the example plot (exactly as it is) with ggplot2?

This is not automatic but I got there, sort of.

library(ggplot2)
DF <- data.frame(X = c(rnorm(10000), rnorm(10000) + 10), 
                 Y = c(rnorm(10000), rnorm(10000) + 10))
ggplot(DF, aes(X, Y)) + geom_point()


KDE <- MASS::kde2d(x = DF$X, y = DF$Y, n = 100)
DFnew <- data.frame(X = rep(KDE$x, 100), Y = rep(KDE$y, each = 100), 
                    Z = as.vector(KDE$z))
ggplot(DFnew, aes(x = X, y = Y, z = Z)) + geom_contour_filled()

Created on 2020-05-27 by the reprex package (v0.3.0)

1 Like

I suppose you could down-sample your data prior to plotting using sample_n()

1 Like

That applies to my second link, but I think the first link for geom_raster() appears to be along the lines you are after:

library(ggplot2)

ggplot(faithfuld, aes(waiting, eruptions)) +
  geom_raster(aes(fill = density)) + # or geom_raster(aes(fill = density), interpolate = TRUE)
  scale_fill_viridis_c() # many possibilities for colours here

Alternatively, FJCC has a very similar solution based on (I believe) the same underlying density calculations.

1 Like

Perhaps I misunderstand your need, but this may work for you: https://ggplot2.tidyverse.org/reference/geom_density_2d.html

1 Like

I really like geom_pointdensity(), which is something of hybrid giving you both densities as well as outlier points. Using @FJCC's example data:

library(ggplot2)
library(ggpointdensity)

DF <- data.frame(X = c(rnorm(10000), rnorm(10000) + 10), 
                 Y = c(rnorm(10000), rnorm(10000) + 10))

ggplot(DF, aes(X, Y)) +
  geom_pointdensity() +
  scale_color_viridis_c()

Created on 2020-05-28 by the reprex package (v0.3.0)

3 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.