Preventing products (Rmd, Shinyapps) to request third party data (and thereby communicate the user's IP address)

We've set up an RStudio Connect server at our institution and would like to embed some visualisations on the server within our home page with an iframe. Our IT observed that in one of our visualisations, a javascript library is loaded from mathjax.rstudio.com. Theoretically, RStudio is now in possession of the user's IP address. This goes against our policy: It is prohibited to send the IP addresses of users visiting our site to third parties.

Is there a way to automatically check each application on RStudio Connect if data is retrieved from third party sites?

2 Likes

You can host your own copy of MathJax by specifying an alternate URL as described here: https://bookdown.org/yihui/rmarkdown/html-document.html#mathjax-equations

2 Likes

Thanks for this quick response! Yes that would be a solution to this specific problem.. but is there a way to generically solve the problem in order to set the IT and legal services at ease? Something to prevent products hosted on RSC to connect to third party services?

There are two types of outgoing connection that seem to be a bit conflated here:

  • RStudio Connect products connecting to third party services (i.e. from Connect, on the "server side")
  • The user's browser connecting to third party services (i.e. when the user loads the visualization from their desktop)

The former is usually solved with a firewall that prevents Connect talking to outside services. In this type of a setup, package management typically becomes a problem and RStudio Package Manager fits well as a solution.

The latter is much trickier, because the user probably has access to the internet (i.e. if you visit google.com or rstudio.com in your browser, you are sending your public IP address to those websites already). You can put the user behind a firewall that prevents them from accessing the internet, but that isn't as common as the above.

It sounds like the latter is your concern. Do users have access to the internet? Is it a problem if the user goes to rstudio.com and gives their public IP address away anyways? Or is it just a problem if your content calls out to rstudio.com (again, from their desktop, not the Connect server) without their knowledge?

EDIT: Also, if your policy is against sending IP addresses to a third party, what is meant by "sending?" You are not "sending an IP address" here. You are asking the user's browser to load content from a third party... the user's computer is making a request from that third party, which includes their public IP address. If that policy is as strict as it sounds, it seems like the user should not have access to the internet... but that's just my read :slight_smile:

I.e. contrast this with me figuring out all of the (private) IP addresses on my network and then sending a list of them to my buddy. Private IP address != Public IP address.

1 Like

You laid out the matter very nicely and clarified quite a few points. I might have to specify the issue a bit more on a certain point:

So "the users" can be anybody with an internet connection visiting our institution's website. The data protection policy states that if such a user visits our institution's website, their IP address should only be communicated to us. If the user's IP address is somehow made known to a third party without the user's explicit consent the policy is violated. This is for example the case, if we use third party data like JS libraries or map tiles and the user's browser requests this data.

Thank you for clarifying the point of public vs. private IP address, I will definitely communicate this point to our legal services.

That makes sense, thanks! If it is possible to share here what they come back with: if this refers to both public and private IP, for instance, that would be helpful! I have honestly never heard of such a use case with respect to public IP.

It is an interesting one to keep in mind, though! It seems like something that it would be unlikely Connect would be able to enforce, since there are lots of subtle ways that even your users could write content to request third party resources. However, some type of tooling to help R developers or admins audit / detect / remedy such behavior would be nice. We can, at very least, record the use case in our feature tracker to keep in mind!! Thanks for sharing!

If it is possible to share here what they come back with: if this refers to both public and private IP, for instance, that would be helpful!

I'll get back to you on this.

However, some type of tooling to help R developers or admins audit / detect / remedy such behavior would be nice.

Yes that would be helpful. However, I've received a response from the legal team of our institution that we explicitly safeguard ourselves from this very problem by stating the following on our website:

External content from Youtube, Vimeo, SRF, Issue, Soundcloud, Slidershare and Google Maps is displayed on our website via iframe and other tools. The IP address is transmitted and the content provider can set cookies, etc. If the website visitor is logged in to the network of the respective third party provider at the same time, a visit to the website may be assigned to the user’s account, depending on the provider. The ZHAW has no control over the manner in which the data are transmitted and has a legitimate interest in integrating this external content.

So this is a known issue and RStudio Connect applications are by far not the only "culprits" of retrieving third party data. Thanks for your help on this!

1 Like

My pleasure! Glad to hear my musings were helpful :slight_smile: Please do let us know if you have any other questions! If you need to ask / share information that isn't fitting for a public forum, feel free to open a support ticket!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.