Trouble Plotting 20G Data

Hi Community.

Newbie here (Marcelo)

Thanks in advance for the support, we all know how valuable it is. Many thanks.

I am a Linux engineer that supports a RStudio host. No knowledge in RStudio and no access to the Graphical interface.

The user is not able to plot if uses a large amount of Data. Starting with 10 numbers is OK, 10K numbers is OK, 10G numbers is OK, but when taking the entire 20G file size RStudio chokes, does not return, and needs to be restarted.

I did not find any log on /var/log refering to RStudio.

Questions: Were RStudio logs are located? Can we access them through the GUI? If so do you have detailed steps, as I do not have access to the GUI and will have to instruct user?

If anyone have any path for solution please advise.

Many thanks,

Marcelo Carvalho | Sr. Linux AWS Systems Engineer

On Debian/Ubuntu systems this will typically be located at:

/var/log/syslog

On RedHat/CentOS systems this will typically be located at:

/var/log/messages

In RStudio Server Professional Edition 1.3.820-1. Which server version are you using and which OS?

Also, if more memory can be made available, increase the ulimit for the user under which the server runs to “unlimited.”

FYI......

Nothing much of logs here:

/home//.rstudio-desktop/log

ls -l *
-rw-rw-r-- 1 0 Jun 5 2019 rdesktop.log
-rw-rw-r-- 1 868 Jun 5 2019 rsession-mmustafa.log

Sorry, I took RStudio host to mean a server, rather than a desktop client.

So, the answer to log location depends on the OS under which the desktop program is running:

However, on my PopOS! 20.10 Ubuntu work-alike, the files are in ~/local/share/rstudio/log.

Log file: /home/roc/.local/share/rstudio/log/rsession-roc.log
--------------------------------------------------

2022-01-25T21:15:10.201400Z [rsession-roc] ERROR r error 4 (R code execution error) [errormsg: subscript out of bounds]; OCCURRED AT rstudio::core::Error rstudio::r::exec::executeSafely(rstudio_boost::function<void()>) src/cpp/r/RExec.cpp:252; LOGGED FROM: void rstudio::session::{anonymous}::processEvents() src/cpp/session/SessionHttpMethods.cpp:114
2022-01-25T21:15:10.201400Z [rsession-roc] ERROR r error 4 (R code execution error) [errormsg: subscript out of bounds]; OCCURRED AT rstudio::core::Error rstudio::r::exec::executeSafely(rstudio_boost::function<void()>) src/cpp/r/RExec.cpp:252; LOGGED FROM: void rstudio::session::{anonymous}::processEvents() src/cpp/session/SessionHttpMethods.cpp:114

Where in your overall system is the users RStudio application mounted, generally, and can you get the user to send you the output shown from the RStudio menu bar with Help | Diagnostics | Write Diagnostics Report ?

Nothing there on RStudio.

file /var/log/syslog

/var/log/syslog: ASCII text

grep -i rstudio /var/log/syslog

(Nada... )

From yesterday, the day we troubleshoot, there is very little like this:

..........[495574]: ERROR session hadabend; LOGGED FROM: rstudio::core::Error {anonymous}::rInit(const rstudio::r::session::RInitInfo&) src/cpp/session/SessionMain.cpp:680.....

Does this ring any bell?

No other log error line referring to RStudio

Checking ulimit.

Many thanks

Marcelo_

I guess we are talking about a RStudion Server that the user have the client on it.

Single user, single host, Rstudio installed and 512G RAM

Host is running NAME="Ubuntu"
VERSION="20.04.2 LTS (Focal Fossa)"

Working on ulimit

Many thanks,

Marcelo_

Does it really make sense to load 20GB worth of data into memory ?

I wonder if an approach as discussed in Bigger data would be better ? There an example is shown how to work with 37 GB of data on a laptop with 16 GB of RAM.

2 Likes

Yes, most plots typically make use of much less data than that, since our visual perception, let alone ability to distinguish and interpret information, has much lower bandwidth.

Typically, a multi-GB data set would be summarized before a plotting step, since we can't visually process even 1 million things, let alone 20 billion.

2 Likes

Below are the log lines I found the last try to replicate the error.

             Can we make any sense out out it?
             Is there anything the log points we can adjust?
            It mentions:   **is_closing_session(): no DBUS_SESSION_BUS_ADDRESS in environment**

Please advise

Many thanks

Marcelo_

root@atrssflinvm01:/var/log/BAK# grep 2035345 *
apport.log.1:ERROR: apport (pid 2035345) Tue Feb 1 12:51:02 2022: called for pid 2024533, signal 11, core limit 0, dump mode 1
apport.log.1:ERROR: apport (pid 2035345) Tue Feb 1 12:51:02 2022: executable: /usr/lib/rstudio-server/bin/rsession (command line "/usr/lib/rstudio-server/bin/rsession -u xxxxxxxx --session-use-secure-cookies 0 --session-root-path / --session-same-site 0 --launcher-token XXXXXXXX --r-restore-workspace 2 --r-run-rprofile 2")
apport.log.1:ERROR: apport (pid 2035345) Tue Feb 1 12:51:02 2022: is_closing_session(): no DBUS_SESSION_BUS_ADDRESS in environment
apport.log.1:ERROR: apport (pid 2035345) Tue Feb 1 14:06:49 2022: wrote report /var/crash/_usr_lib_rstudio-server_bin_rsession.1002.crash

FYI................

We have this:

echo $(pgrep gnome-session)
1525

echo $(dbus-launch)
DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-Z722UHaSOv,guid=200a9315bb073776a56064bc61fab131 DBUS_SESSION_BUS_PID=2245146

echo $DBUS_SESSION_BUS_ADDRESS <<<<< Returned empty

    Question:  **Would the following be a solution?**  (I only have access to the OS, I cannot try, have to bring the user which is very busy.  I a trying to have some solution before bringing user back to the table.)
pid_gnome=$(pgrep gnome-session)
DBUS_SESSION_BUS_ADDRESS=$(grep -z DBUS_SESSION_BUS_ADDRESS /proc/${pid_gnome}/environ|cut -d= -f2-)
export DBUS_SESSION_BUS_ADDRESS=${DBUS_SESSION_BUS_ADDRESS}

Please advise.

Many thanks

Marcelo_

By the way.

Reminding us that Plot works for data much smaller than 20G

Error is when trying to process a 20G data file.

Many thanks,

Marcelo_

See this S/O thread. dbus provides interprocess communications on desktop environments. I can speculate that RStudio Server doesn't need to call it except in unusual circumstances, such as chewing very large data, so that is why small plots work.

I can't get a link to the appropriate RStudio Server github page (github is having issues this morning), but I'll flag this to see if anyone else has ideas.

While the causes are being considered, consider sampling the data and using bootstrap or Monte Carlo techniques to get a handle on the data.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.