We are a few people working on an rstudio server and use sparklyr to access data and compute on a HDFS/spark cluster (dozens request by day each).
Our problem is the following: we connect with sparklyr to work, but the connection does not stop on idle, even after an entire week-end. The cluster is not that big, and each connection use 1 core (we use dynamic allocation) and multiple Go of RAM, that other people can not use, and more importantly night/week-end CRON job cannot use either.
We need to be sure to manually disconnect before leaving, and before working on something else, wich of course is not that great and we easily forget it.
I spend some time in the configuration parameters of spark, but since sparklyr uses spark-shell to connect it is not possible to stop it when idle (difficult to recongnize an idle), at least to my knowledge.
It seems it should be easier to recognize the idle inside rstudio (no activity and no running command), and I know it's exits. So I tried to add :
in the .Last function of .Rprofile, in hope it will disconnect when rstudio is idle, but it does not seems to work (session-timeout-minutes=20 in /etc/rstudio/rsession.conf ).
When I manually do:
rstudio-server force-suspend-session (pid)
It correctly stop the spark session.
The only workaround I see right know is to run a CRON that forcefully stop session each fixed hour/day. But its ugly, we cannot launch big calcul at night without modifying the CRON first (does not detect running calcul), and cannot be safely used during day (it will stop people when they are possibly working)
My questions are:
- Does an active spark connection is blocking the idle detection and session suspend in rstudio ? Why ? Can I change that ?
- Can I detect an idle state from outside an rstudio session ? or the time when the last command was run ? or if a command is running ?
- Is there a spark parameter that I missed that will automatically disconnect when not launching anything after a certain amount of time?