Can't connect sparklyr to shiny

shiny

#1

I’m pretty new to Shiny and Spark.

I want to deploy a ShinyApp with a spark connection. Everything works how it should when I just hit RunApp, but whenever I try to publish it, I get the error: “Error in value[3L] :
SPARK_HOME directory ‘/usr/lib/spark’ not found
Calls: local … tryCatch -> tryCatchList -> tryCatchOne ->
Execution halted”

This directory exists on my cluster, so I’m not sure why it’s not finding it.
Here’s the code I’m trying to publish.

library(sparklyr)
library(shiny)
library(ggplot2)
library(rmarkdown)


Sys.setenv(SPARK_HOME = '/usr/lib/spark')
config<- spark_config()
spark_install(version = "2.2.0")
sc<-spark_connect(master = 'yarn-client',  version = '2.2.0')
tbl_cache(sc, 'output_final_v2')
output_tbl2<-tbl(sc, 'output_final_v2') 


ui <- fluidPage(
  
  textInput("name", "Enter Shortname", "furycat"),
  #selectInput("shortname", "Choose Shortname", choices = l),
  textInput("item_name", "Enter Item Name"),
  selectInput("month", "Choose Month", choice= c("January","February","March", "April", 
                                                 "May", "June", "July", "August", "September", 
                                                 "October", "November", "December")),
  selectInput("dow","Choose Day of Week", choice = c("Monday", "Tuesday", "Wednesday",
                                                     "Thursday", "Friday", "Saturday", "Sunday")),
  numericInput("count_customers", "Enter Number of Customers:", 2),
  numericInput("views", "Enter Number of View Book Form:", 30),
  
  
  plotOutput("plot1"),
  plotOutput("plot2"),
  plotOutput("plot3")
  
)

server <- function(input, output, session) {
  
  
  C2<-reactive( output_tbl2 %>%
                  mutate(views = input$views)%>%
                  filter(input$name == shortname)%>%
                  filter(input$dow== dow)%>%
                  filter (input$month == month)%>%
                  filter (input$item_name == item)%>%
                  filter (input$count_customers == count_customers)%>%
                  collect)
  output$plot1 <- renderPlot({
    
    p1<-ggplot2::ggplot(data = C2() , aes(x=price_per_customer, y=final_probability)) + geom_line() + ggtitle("Probability of Purchase") + labs(y="Probability",x= "Item Price")
    print(p1)
  })
  
  
  output$plot2 <- renderPlot({
    
    p2<-ggplot2::ggplot(data=C2(), aes(x=price_per_customer, y=((views*final_probability)*price_per_customer))) + geom_line() + geom_hline(aes(yintercept = max((views*final_probability)*price_per_customer))) + ggtitle("Projected Revenue") + labs(y="Expected Revenue",x="Item Price")  
    print(p2)
    
  })
  
  output$plot3<-renderPlot({
    
    p3<-ggplot2::ggplot(data=C2(), aes(x=price_per_customer))+ geom_line(aes(y=(views*final_probability)*price_per_customer)) + geom_line(aes(y= (((views*final_probability)/price_per_customer)))) + ggtitle("Iso-Profit vs Expected Volume")
    
    print(p3)  
  })
  
  
}

shinyApp(ui, server)

#2

That’s to be expected. Deploying a spark application is not trivial. You can start by reading this overview. What you were attempting to do above is to access your local directory from a remote server. While the app directory is uploaded when you deploy an app successfully, all other directories are not.


#3

Thanks for responding! I tried running this command cluster_url <- system('cat /root/spark-ec2/cluster-url', intern=TRUE) but it says permission denied. Is there a way to grant myself permission within R?


#4

Hi @alex, can you share with me some more info about your setup? Is the app being deployed in a Shiny server inside the same box where everything runs fine? If not, is the Shiny server also part of the cluster?


#5

If you have root access (which you should if you set up the cluster), you could try running system('sudo cat /root/spark-ec2/cluster-url', intern=TRUE), and it may work. Otherwise, you may have to play a bit with system to give yourself permission (e.g. this post on SO might help). In any case, as the docs say this will only work “if you are running on EC2 using the Spark EC2 deployment scripts”. Did you follow these instructions to do so?

I’m afraid I can’t give you any more advice as I’m not that familiar with sparklyr myself. Hopefully others will pitch in if you’re still struggling. Good luck!


#6

Hey @edgararuiz. I don’t believe I’m running shiny server from the same box. I just type my code in App.R and hit run and that’s where it works. Shiny Server is part of the cluster though.


#7

Ok, can you confirm that the server that has Shiny Serve, is also a recognized Host by the cluster, and that it has at least the following services installed: Hadoop, Yarn, Hive and Spark? The error you’re getting makes me thing that if it is part of the cluster, it is missing at the very least the Spark service.


#8

@edgararuiz I’m sorry, but I’m not totally following you.
I think I may have gone wrong when setting up the cluster, but I’m not totally. All of this is very new for me. I followed these instructions to initially set up the cluster and added a port for shiny server and went through the shiny server installation steps. However, I did not do anything that’s included in the doc that @barbara linked to. Is this where I went wrong?


#9

@alex, my bad, I didn’t realize this is your cluster. Can you go in the server that has Shiny, can you remote into it via a terminal session and navigate to the /usr/lib/spark folder


#10

No worries. Yes, I can remote into it via the terminal session.


#11

Should shiny-server be a directory within /usr/lib/spark? @edgararuiz


#12

I don’t think so, we just need to verify the contents of /usr/lib/spark and, now that you mention it, /opt/shiny-server


#13

When I run that it confirms that’s it’s a directory, and I can see the files that are in shiny-server.


#14

Ok, for simplicity sake, can we replace the top portion of the code with this and try again please?



library(sparklyr)
library(shiny)
library(ggplot2)
library(rmarkdown)

sc<-spark_connect(master = 'yarn-client',  
                  version = '2.2.0', 
                  spark_home = "/user/lib/spark")


tbl_cache(sc, 'output_final_v2')

#15

I get the same error even with the simplified code.


#16

I’m curious, if you run R in the terminal of the server where Shiny is, and run the same lines I ask you to replace, does that err out?


#17

I had to install the packages first, and it won’t install sparklyr. It keeps saying non-zero exit status.


#18

Ok, how did you install the packages for the Shiny app to work? If the R packages were not pre-install before deploying the app, then this may be an issue of Linux package dependencies, Id suggest running this in the terminal

sudo apt-get -y install libcurl4-gnutls-dev
sudo apt-get -y install libssl-dev
sudo apt-get -y install libxml2-dev

#19

It says that all of those commands are installed and up to date.


#20

I no longer get my original error, but my new errors are:
ignore --preserve-environment, it's mutually exclusive to --login.
Failed during initialize_connection: org.apache.hadoop.security.AccessControlException: Permission denied: user=shiny, access=WRITE, inode="/user/shiny/

I know the second one is a permission issue. I’m just not sure how to fix it.