Error launching session because of mounting fail

Hi,

Every time we are trying to launch a session remains in "Peding" status with the following message:

MountVolume.SetUp failed for volume "mount0" : mount failed: exit status 32 Mounting command: mount Mounting arguments: -t nfs :/home/azureuser /var/lib/kubelet/pods/ad918935-f1a9-4685-a42a-119968a73d9e/volumes/kubernetes.io~nfs/mount0 Output: mount.nfs: access denied by server while mounting :/home/azureuser

The file "launcher-mounts" is:

MountType: NFS
Host:
Path: /home/{USER}
MountPath: /home/{USER}
ReadOnly: false
Cluster: Kubernetes

The output from sudo rstudio-server verify-installation --verify-user=azureuser is:

TTY detected. Printing informational message about logging configuration. Logging configuration loaded from '/etc/rstudio/logging.conf'. Logging to '/var/log/rstudio/rstudio-server/rserver.log'.
Checking Job Launcher configuration...
Ensuring server-user is a Job Launcher admin...
Getting list of Job Launcher clusters...
Job launcher configured with the following clusters: Kubernetes
launcher-adhoc-clusters is empty - all clusters may be used to launch adhoc jobs
launcher-sessions-clusters is empty - all clusters may be used to launch session jobs
Verify Installation Failed: system error 71 (Protocol error) [description: No home directory mount configured for cluster Kubernetes and workbench RStudio - you must configure a home directory mount in /etc/rstudio/launcher-mounts for sessions to properly load user home directories]; OCCURRED AT rstudio::core::Error rstudio::server::session_proxy::overlay::{anonymous}::verifyClusterMounts(const std::set<std::__cxx11::basic_string >&, const std::vectorrstudio::server::job_launcher::MountConfigEntry&, rstudio::server::job_launcher::WorkbenchScope) src/cpp/server/JobLauncherVerification.cpp:314
2022-07-29T11:55:30.420638Z [rserver] ERROR system error 71 (Protocol error) [description: No home directory mount configured for cluster Kubernetes and workbench RStudio - you must configure a home directory mount in /etc/rstudio/launcher-mounts for sessions to properly load user home directories]; OCCURRED AT rstudio::core::Error rstudio::server::session_proxy::overlay::{anonymous}::verifyClusterMounts(const std::set<std::__cxx11::basic_string >&, const std::vectorrstudio::server::job_launcher::MountConfigEntry&, rstudio::server::job_launcher::WorkbenchScope) src/cpp/server/JobLauncherVerification.cpp:314; LOGGED FROM: int main(int, char* const*) src/cpp/server/ServerMain.cpp:872


  • The NFS server is an Azure VM and the pod is able to reach the needed ports.
  • I was able to mount the home directory on the server were we have RStudio Workbench and the Launcher.
  • It is an AKS cluster with an Azure CNI network configuration.

Thank you in advance!

Best Regards,
Dumitru

Interesting! Thanks for reporting this! Is your Workbench server living outside the AKS cluster?

This seems like a problem that I would forward to the support team so that we can help more directly! You can do that by sending a link to this discussion and an email to support@rstudio.com

Have you verified the exports on the NFS server? showmount -e?

I would also explore setting up a directly managed pod on the Kubernetes cluster and setting up a similar volume mount (just building such a pod manually yourself). This looks mostly like either the Kubernetes Node or the pods themselves are having a hard time talking to / being trusted by the NFS server. I.e. the NFS server could be blocking the pod's IP address from mounting, firewall, NACL rule, etc.

1 Like

Hi @cole ,

Thank you for your insights, you were on the right direction.

Here it is a brief explanation of our infrastructure and how we found the source of the issue:

Infrastructure

  • AKS (Azure Kubernetes Service) with a Azure CNI as a Network configuration
  • VM (Virtual machine) with RSW (RStudio Workbench)
  • VM with NFS host (Network File System)

All the resources are on the same VNET (Virtual Network), the AKS is in a subnet and the VMs in a different one.
This VNET has a NSG (Network Security Group) allowing the traffic just to the specific ports and all the traffic inside the VNET is allowed by default.

Troubleshooting

We discovered that if we opened all the ports we were getting the mentioned error before.

However, if you do not opened all the ports between the NFS host and the client you will get a time out.
This is because the client will try to reach the NFS trying several ports that are usually used to connect with the NFS protocol, like 111, 2049, 1110, 4045, 892, etc. and you will not get a response back from the NFS host because there is something wrong on the communication.

We were able to see this by trying to mount the NFS directory on a pod, as you suggested Cole.
So on the pod we executed the mount command and we obtained the IP which the pod communicates to the exterior with curl ifconfig.me.

Then on the NFS host we executed the following command to observe all the communications with the port 2049 (default port for NFS protocol).

sudo tcpdump -i eth0 -nn -s0 -v port 2049 | grep "<AKS-IP>"

The interface on our case was "eth0".

Issue

Basically, the network topology was not correctly set up and after a few changes all worked.

For example, be careful if the machines or pods are on the same VNET they can only connect to each other using private IPs. On our case after the network changes we also had to change the "Host" parameter in "launcher-mounts", from an "A record" registered in the Azure-provided DNS to a private IP. So the communication between the pods and the NFS host is trough privates IP.


Hopefully this can be helpful for someone else with the same issue :wink:

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.