Can we host and access datasets on the local storage of an RStudio Connect server?

Greetings,

Apologies if this has been addressed elsewhere: having problems finding it if so.

We have RStudio Connect deployed on a server, and many of the assets we will be publishing to rsconnect will need access to different slices of a large dataset which is more-or-less stored as an HDF5 file.

I was wondering if we could host this HDF5 file on the local HD of the server that is serving rsconnect, and have our published assets reference that dataset by its local path on that server.

Could we, for instance, copy the file somewhere in RStudio Connect's Server.DataDir directory, and then have our assets (1) detect if they are running within RStudio Connect; and (2) conditional on that rig up a path to the HDF5 file in the Server.DataDir, or something similar like that?

For instance, if we assume Server.DataDir is kept to its default value of /var/lib/rstudio-connect, could I just have our server admin drop my data.h5 file in there and do something like this somewhere in a shiny app I want to deploy to rsconnect?

if (Sys.getenv("R_CONFIG_ACTIVE") == "rsconnect") {
  h5.path <- "/var/lib/rstudio-connect/data.h5"
} else {
  h5.path <- "/somewhere/on/my/laptop/data.h5"
}

dat <- rhdf5::h5read(h5.path, "foo/A")

If not, is there something else we could do along those lines? We wouldn't need any type of security / user access restriction around accessing that particular hdf5 file, and the assets we publish to rsconnect would not need to modify/write that file either.

I'm not the admin of the server, so can't play with different flavors of what I'm asking / proposing here, so any insight anyone can provide that I could pass along would be greatly appreciated.

Thanks!
-Steve

Hey @lianos ! Great question!! Our usual recommendation for this type of thing is to create a top-level directory at the root of the server (i.e. /my-data) and house the data there. This ensures that you do not hit the process sandboxing that Connect utilizes for processes.

We have an article written that addresses this topic in more detail. I hope it helps!! Please feel free to ask if my response or this article prompt any more questions! :grinning_face_with_smiling_eyes: (You want the section on "Absolute References," I suspect. Also, I believe this article pre-dates pins, which are useful for small datasets)

Oh, nice! Thanks for the speedy reply, this Persistent Storage article you linked is super helpful.

I'll let you know if we run into any bumps in the road.

Thanks again!

1 Like