Launching localhost web apps in Rstudio cloud


#1

Dear Rstudio.cloud team!

First of all, thank you for the super awesome product you guys have put together. This is simply amazing and very useful for teaching data science.

I am wondering if it is possible to systematically access localhost of an Rstudio.cloud instance. I see that, for example, shiny is launching in special container, e.g.
https://username.rstudio.cloud/faf8d125351345cb9fd5a3b33f8c1531/?view=shiny
or
https://username.rstudio.cloud/faf8d125351345cb9fd5a3b33f8c1531/p/6105/
when on my local machine it would be in http://127.0.0.1:7444/

This tells me that there's not a simple way of launching like localhost:54321, which is what I need for. say, web interface to h2o cluster I managed to launch in space. I understand that you guys need to write custom wrapper for every application (port?) to re-expose it in Rstudio.cloud and there's no way I can just launch arbitrary web app from the cloud instance in browser session. I did not test tensorflow, but there you have tensorboard which is also a localhost web app. Do you guys have to explicitly map ports to folders to enable web access? How about mapping 54321 for H2O Flow, please?


#2

These URLs have a predictable formation:

The /p/6105 bit means port 6105, or localhost:6105. So for H20 Flow, you could use:

https://username.rstudio.cloud/faf8d125351345cb9fd5a3b33f8c1531/p/54321/

Note however that for security reasons, this feature is being deprecated, so we wouldn't recommend building infrastructure on it. In a future RStudio release, you will need to use the rstudioapi::translateLocalUrl() method to create these URLs.


#3

Pretty cool, thanks. There's something going on with H2O web interface not being able to talk to the java cluster, but I guess it is a java security issue, rather than anything related to R.

Error calling GET /3/Models
HTTP connection failure: status=error, code=404, error=Not Found

I will file a ticket with them, but eventually you guys may want to talk to each other and/or establish some rules for making "Rstudio.cloud compatible" web fronts for processes running in the background. I understand that Rstudio.cloud is not VM service and may have some limitations. Other than web interface, everything is running flawlessly and I couldn't be happier! Thanks for making this awesome product!

Reprex is actually very easy:

library(h2o)
h2o.init()

After this cluster should be available at localhost:54321, and it is. But none of the commands are working. Trying Admin > ClusterStatus fails with error above.


#4

What is the output of h2o.clusterStatus(), right after calling h2o.init()?

You may need to explicitly limit the memory that h2o consumes, as I don't believe that it properly detects the cgroup limit and instead looks at the host machines total memory and attempts to start a JVM with too high a maximum memory.

That said, I still can't get the Flow interface to properly launch, fiddling with it...


#5

Flow is available at .../p/54321/flow/index . html

I opened a ticket in H2O Jira: PUBDEV-6187


#6

Okay, I was able to launch Flow and got the same error you did.

I tested on a regular RStudio Server Pro instance and saw the same behavior, so this is not an issue with rstudio.cloud itself. h2o Flow appears to make its REST calls with an absolute path such that only the protocol and server are prepended. Which obviously won't work if it is actually being served with leading path elements.

If you do open a ticket, please link it here, I would be interested in following it.


#7

If I don't specify anything, it allocates 0.24Gb

Starting H2O JVM and connecting: . Connection successful!

R is connected to the H2O cluster: 
    H2O cluster uptime:         2 seconds 711 milliseconds 
    H2O cluster timezone:       Etc/UTC 
    H2O data parsing timezone:  UTC 
    H2O cluster version:        3.22.1.1 
    H2O cluster version age:    12 days  
    H2O cluster name:           H2O_started_from_R_rstudio-user_zdn362 
    H2O cluster total nodes:    1 
    H2O cluster total memory:   0.24 GB 
    H2O cluster total cores:    1 
    H2O cluster allowed cores:  1 
    H2O cluster healthy:        TRUE 
    H2O Connection ip:          localhost 
    H2O Connection port:        54321 
    H2O Connection proxy:       NA 
    H2O Internal Security:      FALSE 
    H2O API Extensions:         XGBoost, Algos, AutoML, Core V3, Core V4 
    R Version:                  R version 3.5.0 (2018-04-23) 

Yesterday I remember seeing 700MB allocated by default. Strange. I can still override the default and allocate, say 600MB like so:

library(h2o)
h2o.init(max_mem_size = "600M"
h2o.clusterInfo()

#> R is connected to the H2O cluster: 
#>     H2O cluster uptime:         2 seconds 577 milliseconds 
#>     H2O cluster timezone:       Etc/UTC 
#>     H2O data parsing timezone:  UTC 
#>     H2O cluster version:        3.22.1.1 
#>     H2O cluster version age:    12 days  
#>     H2O cluster name:           H2O_started_from_R_rstudio-user_lau902 
#>     H2O cluster total nodes:    1 
#>     H2O cluster total memory:   0.57 GB 
#>     H2O cluster total cores:    1 
#>     H2O cluster allowed cores:  1 
#>     H2O cluster healthy:        TRUE 
#>     H2O Connection ip:          localhost 
#>     H2O Connection port:        54321 
#>     H2O Connection proxy:       NA 
#>     H2O Internal Security:      FALSE 
#>     H2O API Extensions:         XGBoost, Algos, AutoML, Core V3, Core V4 
#>     R Version:                  R version 3.5.0 (2018-04-23) 

Flow interface does not seem to be getting through to JVM.


#8

Perhaps a difference between 3.22.1.1 and 3.20.0.8 (what is installed from the CRAN version of h2o)?


#9

H2O R package doesn't have a default value for the amount of Java memory to allocate. The H2O backend thus starts with JVM's default.

You guys are using openjdk 8 and in that case it will take 1/4 of available memory which might be too little for H2O. Anything less than 1GB will be probably too little. 1 GB is probably fine for experimenting.

I am trying to reproduce the issue and access Flow. The example URL in this discussion looks like this:

https://username.rstudio.cloud/faf8d125351345cb9fd5a3b33f8c1531/p/54321/

Where do I get the session id (??) faf8d125351345cb9fd5a3b33f8c1531?


#10

I am able to get to Flow in terminal: curl -v http://localhost:54321/flow/index.html

One thing to note that recent version of H2O only allow connections to Flow from localhost (by default).

If you want to open H2O to the world you can do it by specifying:

h2o.init(bind_to_localhost=FALSE)

This is generally a bad idea unless there is some other security in place.


#11

Hi Michal,

Thank you for looking into this. You can get session id by calling

browseURL("localhost:54321")

This will spin off browser with session id, but wrong path to flow. You will then correct it as per template above (add /p/54321/flow/index . html)


#12

Okay, I figured how to get to Flow

h2o.init()

rstudioapi::translateLocalUrl(url="xxxlocalhost:54321/flow/index.html", absolute = TRUE)
[1] "xxxuser-name.rstudio.cloud/6920fb74abf147369cc27a4632da1c19/p/5a31f987/flow/index.html"
(xxx=https://)

Open this URL in your browser.

This will display Flow but it won't let you do anything because Flow doesn't expect to work behind a proxy like that. I think we can fix that and make it work on RStudio Cloud.

Thanks for reporting in https://0xdata.atlassian.net/browse/PUBDEV-6187 - we will try to do this asap.


#13

Thank you for prompt turnaround, Michal. This is very useful.

Flow doesn't expect to work behind a proxy

This reminded me of another issue PUBDEV-5602 I figured out recently. Clearing all proxy settings in R session allowed me to connect to Flow on my corporate machine. That issue is, I am sure, unrelated, but generally speaking some clarification with regards to use of H2O behind proxy would be useful.


closed #14

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.