Handle Big data in R

rstudio

#1

Hello,
I am using Shiny to create a BI application, but I have a huge SAS data set to import (around 30GB). So I am using the library haven, but I need to Know if there is another way to import because for now the read_sas method require about 1 hour just to load data lol.


#2

If you can convert the data into another format then you have some options.

For large data you could consider a database: https://db.rstudio.com/

For csv files, data.table::fread should be quick. Other options are the feather or fst packages with their own file formats.

However, bear in mind that you will need to store the data in RAM, so unless you have at least ca.64GB of RAM this will not work and you will require a database. There are some workarounds to read data from disk when there is insufficient RAM but I have not used these and so cannot comment on them.


#3

@martin.R Makes some good suggestions regarding different packages/using a database to store the data.

I’ll add a non-technical question: do you actually need all 30GB of data at once? Or can you condense it down/break it up in to smaller sections that are more manageable?


#4

Thanks I will try to convert My data.
For now I don’t have a 64Gb of RAM, but I was thinking to extend my memory, such I am on a windows server system.


#5

I should used them all, such the goal is to do a merge with other table to be able to compute some KPI with all the available information


#6

If you need to join the data to another table (and then presumably do some summarizing on the joined lot) then you have a strong case for using a database backend.

If you can offload both your SAS data and “the other tables” to a database server - which, unlike R, is not constrained to available memory - you could let the database do the heavy lifting and build a relatively lightweight application in R calling the data remotely via DBI and dplyr linked tibbles, with later on collecting & visualizing just the (much smaller) summary result.


#7

This is a good option.

As an alternative (to save you create a separate database) could you/your collaborators perform the necessary summaries in SAS itself, and so provide R with a smaller set of results to visualise?


#8

Thanks a lot. I think I will use your solution,such it is all must the same idea of Martin.
Thanks Guys


#9

the best way working with shiny is to store the data that you want to present in MySQL or redis and pre-processing them very well.

And even proficient Java Web Developer will give you the same advice.