Ggplot2::geom_sf performance

ggplot2
mac
sf
graphics

#1

I am on a Mac (2016 MBP 15" with 16GB memory) and am testing out the development version of ggplot2 with sf plotting support.

Overall I am loving it, but I am finding geom_sf to be very slow when plotting larger sf polygons objects. I think I’ve narrowed it down to the quartz graphics device, which is much slower than X11 on these large objects (however quartz produces nicer graphics than X11).

I would be happy to just use X11 but I do like having plots appear in the plot pane in RStudio, and I believe the RStudio Graphics Device uses quartz on Mac OS and I don’t think there’s a way to change that. So I’m not sure if this is a ggplot2/sf issue or if it’s a Mac quartz issue… or possibly even an RStudio issue, but I thought I would post it here first to see if others are experiencing the same thing.

I also did the same comparison with sf's plot method - there is still a large difference between X11 and quartz, however overall it is faster than ggplot2::geom_sf.

Interestingly, quartz actually is faster than X11 on the smaller objects.

Reprex with timings; session_info at the end:

library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.1.3, proj.4 4.9.3
library(ggplot2)

## Plotting a small polygons object with ggplot2::geom_sf
nc <- read_sf(system.file("gpkg/nc.gpkg", package = "sf"))

## Create ggplot2 object
nc_gg <- ggplot() + geom_sf(data = nc)

## geom_sf with X11
X11(type = "cairo")
system.time(print(nc_gg))

#>    user  system elapsed 
#>   0.761   0.150   2.570
graphics.off()

## geom_sf with quartz
quartz()
system.time(print(nc_gg))

#>    user  system elapsed 
#>   1.003   0.159   1.264
graphics.off()

## Plotting a large polygons object with ggplot::geom_sf
tmpzip <- tempfile(fileext = ".zip")
download.file("https://github.com/bcgov/bcmaps.rdata/blob/master/data-raw/ecoregions/ecoregions.zip?raw=true", destfile = tmpzip)
gdb_path <- unzip(tmpzip, exdir = tempdir())
ecoregions <- sf::read_sf(dirname(gdb_path[1]))

## Create ggplot2 object
ecoregions_gg <- ggplot() + geom_sf(data = ecoregions)

## geom_sf with X11
X11(type = "cairo")
system.time(print(ecoregions_gg))

#>    user  system elapsed 
#>   2.948   0.363   3.854
graphics.off()

## geom_sf with quartz
quartz()
system.time(print(ecoregions_gg))

#>    user  system elapsed 
#>  97.607   0.686  98.370
graphics.off()

## For comparison, here are timings using sf's plot method:

## Small object
## plot_sf with X11
X11(type = "cairo")
system.time(plot(st_geometry(nc)))

#>    user  system elapsed 
#>   0.042   0.127   1.788
graphics.off()

## plot_sf with quartz
quartz()
system.time(plot(st_geometry(nc)))

#>    user  system elapsed 
#>   0.210   0.062   0.271
graphics.off()

## Large object
## plot_sf with X11
X11(type = "cairo")
system.time(plot(st_geometry(ecoregions)))

#>    user  system elapsed 
#>   1.078   0.628   8.068
graphics.off()

## plot_sf with quartz
quartz()
system.time(plot(st_geometry(ecoregions)))

#>    user  system elapsed 
#>  44.291   0.393  44.665
graphics.off()
Session info
devtools::session_info()
#> Session info -------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.4.2 (2017-09-28)
#>  system   x86_64, darwin15.6.0        
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_CA.UTF-8                 
#>  tz       America/Vancouver           
#>  date     2017-11-28
#> Packages -----------------------------------------------------------------
#>  package    * version    date       source                            
#>  assertthat   0.2.0      2017-04-11 CRAN (R 3.4.0)                    
#>  backports    1.1.1      2017-09-25 CRAN (R 3.4.2)                    
#>  base       * 3.4.2      2017-10-04 local                             
#>  bindr        0.1        2016-11-13 CRAN (R 3.4.0)                    
#>  bindrcpp     0.2        2017-06-17 CRAN (R 3.4.0)                    
#>  bitops       1.0-6      2013-08-17 CRAN (R 3.4.0)                    
#>  class        7.3-14     2015-08-30 CRAN (R 3.4.2)                    
#>  classInt     0.1-24     2017-04-16 CRAN (R 3.4.0)                    
#>  colorspace   1.3-2      2016-12-14 CRAN (R 3.4.0)                    
#>  compiler     3.4.2      2017-10-04 local                             
#>  datasets   * 3.4.2      2017-10-04 local                             
#>  DBI          0.7        2017-06-18 CRAN (R 3.4.0)                    
#>  devtools     1.13.4     2017-11-09 CRAN (R 3.4.2)                    
#>  digest       0.6.12     2017-01-27 CRAN (R 3.4.0)                    
#>  dplyr        0.7.4      2017-09-28 CRAN (R 3.4.2)                    
#>  e1071        1.6-8      2017-02-02 CRAN (R 3.4.0)                    
#>  evaluate     0.10.1     2017-06-24 CRAN (R 3.4.1)                    
#>  ggplot2    * 2.2.1.9000 2017-11-17 Github (tidyverse/ggplot2@582acfe)
#>  glue         1.2.0      2017-10-29 CRAN (R 3.4.2)                    
#>  graphics   * 3.4.2      2017-10-04 local                             
#>  grDevices  * 3.4.2      2017-10-04 local                             
#>  grid         3.4.2      2017-10-04 local                             
#>  gtable       0.2.0      2016-02-26 CRAN (R 3.4.0)                    
#>  htmltools    0.3.6      2017-04-28 CRAN (R 3.4.0)                    
#>  knitr        1.17       2017-08-10 CRAN (R 3.4.1)                    
#>  lazyeval     0.2.1      2017-10-29 CRAN (R 3.4.2)                    
#>  magrittr     1.5        2014-11-22 CRAN (R 3.4.0)                    
#>  memoise      1.1.0      2017-04-21 CRAN (R 3.4.0)                    
#>  methods    * 3.4.2      2017-10-04 local                             
#>  munsell      0.4.3      2016-02-13 CRAN (R 3.4.0)                    
#>  pkgconfig    2.0.1      2017-03-21 CRAN (R 3.4.0)                    
#>  plyr         1.8.4      2016-06-08 CRAN (R 3.4.0)                    
#>  R6           2.2.2      2017-06-17 CRAN (R 3.4.0)                    
#>  Rcpp         0.12.14    2017-11-23 CRAN (R 3.4.3)                    
#>  RCurl        1.95-4.8   2016-03-01 CRAN (R 3.4.0)                    
#>  rlang        0.1.4      2017-11-05 CRAN (R 3.4.2)                    
#>  rmarkdown    1.8        2017-11-17 CRAN (R 3.4.2)                    
#>  rprojroot    1.2        2017-01-16 CRAN (R 3.4.0)                    
#>  scales       0.5.0.9000 2017-10-19 Github (hadley/scales@d767915)    
#>  sf         * 0.5-5      2017-10-31 CRAN (R 3.4.2)                    
#>  stats      * 3.4.2      2017-10-04 local                             
#>  stringi      1.1.6      2017-11-17 CRAN (R 3.4.2)                    
#>  stringr      1.2.0      2017-02-18 CRAN (R 3.4.0)                    
#>  tibble       1.3.4      2017-08-22 CRAN (R 3.4.1)                    
#>  tools        3.4.2      2017-10-04 local                             
#>  udunits2     0.13       2016-11-17 CRAN (R 3.4.0)                    
#>  units        0.4-6      2017-08-27 CRAN (R 3.4.1)                    
#>  utils      * 3.4.2      2017-10-04 local                             
#>  withr        2.1.0.9000 2017-11-17 Github (jimhester/withr@daf5a8c)  
#>  XML          3.98-1.9   2017-06-19 CRAN (R 3.4.1)                    
#>  yaml         2.1.14     2016-11-12 CRAN (R 3.4.0)

#2

2013 (but super beefy) Mac. 10.13.2 beta 5 / CRAN R 3.4.2

X11(type = “cairo”)
system.time(print(nc_gg))
user system elapsed
0.394 0.126 2.104
graphics.off()

quartz()
system.time(print(nc_gg))
user system elapsed
0.851 0.222 1.258
graphics.off()


X11(type = “cairo”)
system.time(print(ecoregions_gg))
user system elapsed
2.796 0.341 3.694
graphics.off()

quartz()
system.time(print(ecoregions_gg))
user system elapsed
102.148 0.699 102.962
graphics.off()

X1 + Cairo (obviously) blew past Quartz.

HOWEVER

When I run X11 with nbcairo:

X11(type = “nbcairo”)
system.time(print(ecoregions_gg))
user system elapsed
21.163 0.505 22.110
graphics.off()

AND

Quartz with antialiasing turned off is:

quartz(antialias=FALSE)
system.time(print(ecoregions_gg))
user system elapsed
52.603 0.959 53.605
graphics.off()

AND

disabling Quartz antialiasing and using a fixed dpi of 72:

system.time(print(ecoregions_gg))
user system elapsed
46.039 0.724 46.922
graphics.off()

Interestingly enough, using Quartz to make a PDF is pretty fast:

quartz(type=“pdf”, dpi=144, antialias=TRUE, file="/tmp/a.pdf")
system.time(print(ecoregions_gg))
user system elapsed
4.071 0.349 4.670
graphics.off()

Using “native” (which I think is that RStudio does internally) makes super nice looking plots on retina systems but also takes forever. “Cocoa” wasn’t much better.


#3

Thank you so much @hrbrmstr for taking the time to run the examples, and confirm I’m not crazy. My colleague on a Windows machine with way worse specs than my Mac had timings that were a fraction of mine.

It makes me wonder if there could be a an option in RStudio (per project, and/or globally) to change the plotting device, so X11(type = "cairo") could be used (or whatever other device a user might want).


#4

You can set this in options:

options(device = "X11")

should do the trick. If you need that every time, you can store that in a .Rprofile in your home directory or adapt .Rprofile.site in your R installation directory.


#5

Yup, and even better with X11.options(type = "cairo") so you get semi-transparency. The only downside is that if you're using RStudio, then the plotting occurs in a separate window rather than in the IDE plot pane.


#6

This is tangential, but I'm seeing this too for large shapefiles. I want to use ggplot() + geom_sf() to create some graphics of every ward in Wisconsin (larger .shp and many, many more polygons), but I'm using macOS, and though I'd like to switch graphics devices from quartz to X11 or really anything about as fast, I'm struggling to get there even after the XQuartz install. Any advice?

Even calling R within a XQuartz terminal, where base::capabilities('X11') evaluates to TRUE, I'm seeing the below error:

## geom_sf with X11
X11(type = "cairo")
#> Warning in X11(type = "cairo"): unable to open connection to X11 display ' '
#> Error in .External2(C_X11, d$display, d$width, d$height, d$pointsize, : unable to start device X11cairo
system.time(print(nc_gg))
#>    user  system elapsed # R just defaults to Quartz ...
#>   1.200   0.009   1.209
opts <- options()
xopts <- X11.options()
opts$device
#> [1] "X11"
xopts$display
#> [1] ""
# `X11.options('display')` value ' ' is the default and should link to the environment variable "DISPLAY"
 xopts$type
#> [1] "cairo"

#7

I really don't know much about the internals of graphics devices, but now that resource-intensive plots are common thanks to sf, it seems like this is an area where there's still lots of room for useful optimization. For one, everything is still running single-threaded. I'm sure parallelizing would be non-trivial given everything has to be collected into a single plot, but given that it seems to be processing each polygon separately, it seems possible. If anyone has ideas about how to improve the situation...well, all I can offer is a willingness to pitch in insofar as I can and my eternal gratefulness.