RStudio crash every time I ran large amount of data fro SQL


#1

Hi,

I'm trying to query around 9M rows from our database. Sometimes it was successful but most of the time crashes. I am using simple code to run this:

library(DBI)
library(RMySQL)

con <- dbConnect(RMySQL::MySQL(),
                             dbname = 'database',
                             host = 'server.ip',
                             port = 'port',
                             user = 'my_user',
                             password = 'my_pass')

query <- 'select * from My_table'

sendQ <- dbSendQuery(con, query)

my_files <-dbFetch(sendQ, n = -1)
  

5mins after I ran dbFetch() it crashes. Is it because I tried to get large amount of data? Is it the specs of my PC affects on this as well?


#2

This kind of operation likely requires a large amount of RAM in order to pull all of the data into local memory. How much RAM is available on your computer?

Additionally, what does the data you are reading look like? If it is all numeric data than you will need less RAM then if the data is all long character strings. It also depends on the number of columns being read in.


#3

Thanks tbradley, I do have 16GB RAM on my computer (with shared PhP applications which was simultaneously accessed by 20 users) .

My data would be 105 columns with a combination of numeric and long character strings. Do you have any suggestions on what specs of the computer should be? Or should I separate RStudio with the other applications? Or maybe a work around on how to extract MySQL?


#4

As you are working with database, you should try to work as much as possible remotely, doing calculation and further transformation in-database.
Do you know about dplyr feature to work with DB thanks to dbplyr ?
See website : http://db.rstudio.com/dplyr/

With this, you'll be able to reduce the size of data that you need to collect for a graph or other R only operation.

dplyr on databases is very powerful, you should give a try to not do directly a SELECT *. Have you tried that already ?


#5

In addition to @cderv's suggestion, I would guess that RStudio is crashing because you do not have enough available memory to query 9 million rows x 105 columns of varying data type.

As @cderv hinted at in his last comment, unless you absolutely need all 105 columns that first thing I would try is to narrow down the number of columns that you actually need.

Although in your case, doing most work within the database might be better, as suggested above.


#6

Thanks cderv, it is good to know that dplyr can do the job. @tbradley you have a good point, maybe I should narrow down my columns. Thank you for the response guys


#7

If your question's been answered (even by you!), would you mind choosing a solution? It helps other people see which questions still need help, or find solutions if they have similar problems. Here’s how to do it:


#8

Hi @tbradley

Regarding on RAM, which one do you suggest? VM or Bare Metal?