I am using a plyr function ddply() to aggregate records and columns into string.
The input is dataset from SQL server.
The output is a table in shiny app.
If the records ~10,000 the output table takes >5 seconds.
Preprocessing of data involves removing of un-needed columns, changing date format and column names only.
My question in general is that is there any way to speed up this process. Like in SQL SERVER MANAGEMENT STUDIOS query for 10,000,000 records can generate results instantaneously.
This doesn't sound like a specific problem with Shiny, but with the database or R in general. How are you getting the 10,000 rows? Could it be a slow query or connection speed in the server? Can you get the same 10,000 rows faster in Python or another client? That would help separate issues in the database vs issues in the R code. Maybe add the code you are using to see if there is something that can be done.
I haven't used that SQL Management Studio, but I would be careful comparing these results. Are you getting all the results or the first page of results? That could explain the speed differences.
In principle (I know too little about your use case to be too specific) it is a good idea to offload the aggregation to your database. Databases were designed with aggregation in mind.
Connect the table as a remote one with dbplyr, do some dplyr tricks and once finished retrieve the summary result with collect(). Keep the data transfer to minimum, and perform as much logic as you can inside the database.
There is no reason why R should be any slower than SQL management studio.
Sorry, for delayed response as I was offline for those days. I will try understand what you said. Just, I am new to this hence, don't understand remote connection and all that stuff. Thanking you for your help and support.
Sorry, for the delayed reply. I haven't worked in Python. But in Management Studio, all records are displayed if called. There, might be some problem in the code because the connection gets terminated frequently and have to reconnect to the server.(Don't know why). I will learn new approaches and will try to update the codes.