Slowing of ggplot2 and plotly

Hi, all i m here to get some help that, what i have to do if ggplot2 and plotly takes too much time load for large data set. I am fed up but didn't find any proper solution.
Thanks!

1 Like

How large is your dataset, and which kind of graph are you trying to plot?

I have list of customer about 100000, i am trying to plot scatter plot.

Welcome to community.rstudio.com!

There are a number of ways you might speed up your code. A good way to get specific advice on this is to provide a reproducible example (what folks usually refer to as a reprex) of your problem, data and code. (And I'd suggest offering a smaller, but representative dataset to start with).

In terms of general strategies to speed up your R-code (or any code for that matter), I would suggest checking out @csgillespie's "Efficient optimisation" chapter from "Efficient R programming". If you can overlook his (British) misspellings of optimisation, there's a wealth of good advice there.

Given the limited info about the issue so far, and just some ideas; I might work through

  • data loading, can you load your data faster? perhaps via a database?
  • are there computationally costly operations that can be avoided or optimized?
  • can you benefit from parallelisation?
  • can you reduce how much data your visualization needs to plot (eg smoothers or densities)?
4 Likes

Thanks a lot, in fact i am developing an R visual for power BI reporting, I developed it, but it's too slow. even I used BW theme.

Can you share some code? I find it odd that you experience that, as I for example work with 8 billion entries and it's rather fast.

3 Likes

It is surprising that ggplot cannot handle such small data. Can you provide your computer hardware info and structure of data?

1 Like

this is the code:
names(Values)<-gsub('/','',gsub(' ','',names(Values))); #Values is the list of measures passed in power BI to that visual.
n<- names(Values)
st1<-paste('Values$',n[1],sep='');
value1<-eval(parse(text = st1));
st2<-paste('Values$',n[2],sep='');
value2<-eval(parse(text = st2));
st3<-paste('Values$',n[3],sep='');
value3<-eval(parse(text = st3));
base_p <- qplot(Values, aes(x=value3, y=value2,text=paste(st1,": ",value1)));
g= base_p + geom_pointr() +theme_classic()+ theme(legend.position="none") + labs(x = gsub('Values$','',st3),y = gsub('Values$','',st2));
p = ggplotly(g);
internalSaveWidget(p, 'out.html');

I believe that the problem here is not related to plotting but to the gsub and paste functions. I would try to opt for stringr or similar packages instead.
Also it's not optimal at all to include gsub functions on your gplot part. This section below for example is something avoid. Save the x as an object and you'll see that it will be faster.

I also don't really understand why do you run the plotly function on a ggplot object. It would be easier to simply run plot_ly, and the objects you want to plot.

In the overall this is not about ggplot or plotly but with the fact that you have too much stuff going around. Avoid gsub functions, and do not include them on the ggplot's arguments.

Ok Thanks alot I'll give you feed back soon.

1 Like

A while back we did a comparison of plotly and ggplotly. Unsurprisingly, ggplotly is slower - it has more to do after all. We found that for our Shiny dashboard, that by using pure plotly code, resulted in a significant speed-up. As always.

2 Likes

If you change ggplotly(g) to partial_bundle(toWebGL(ggplotly(p))), you'll likely see significant improvements in rendering performance, especially for a simple scatterplot with lots of points.

I don't know what internalSaveWidget() does, but it may be contributing to the poor performance. My guess is that it's similar to htmlwidgets::saveWidget() which is a rather naive way of plotting multiple graphs from the same library in the same notebook (it will inject the same JS/CSS dependencies everytime you print). Jupyter notebook used to do this, but I recently added "dependency manager" to avoid duplication and also remove pandoc dependency

2 Likes

I was having similar problems with speed using ggplot2 in power bi even when the dataframes were small (about 5000 rows and 5 columns) and then I tried the same plots using lattice and was surprised that lattice is a lost faster. Though some multivariate plots are taking over 20 seconds to render for me, lattice is working out for me. Just wanted to share my experience.

1 Like