slow reading from remote Postgres database using RPostgres

database
dbplyr
postgres

#1

I'm hoping to get some troubleshooting ideas for a very slow pull from a remote Postgres database using RPostgres. The load on the server is very low, yet it is taking 240s to collect() an 11,500 x 7 table. How can I go about troubleshooting why this is taking so long?

Very basic query:

df <- con %>%
    tbl("myTable") %>% 
    collect()

#2

Do you have a lot of existing connections to the database? That might affect performance.

There may also be network performance problems if you’re using a corporate network over a VPN.


#3

Thanks, @wolfpack. VPN is not an issue for me, but connections might be. I thought Rpostgres closed connections automatically when quitting R, but maybe this was part of the problem.

Here's the non-technical reality: I went to bed, woke up, and the speed issue was resolved. After hours of fiddling to diagnose the cause of low speeds, it was resolved with a good night's sleep. :thinking:


#4

If it's caused by opening too many ODBC connections, you should check out pool. It is a connection management package that automatically handles this issue. There are some drawbacks, but the documentation lays it out pretty clearly.


#5

No, the RPostgres package does not close the connections automatically. I am uncertain how it could.
Your Postgres server does kill idle connections, thought this takes time to take effect.

I have found it a good policy to end all my database related R scripts by calling DBI::dbDisconnect(con). It is a small boilerplate and quickly became a habit, just like naming all my connections con.


#6

That must be it then @jlacko. Thanks.