How can we integarte the random forest model to shiny

I am trying to develop an application in such a way that the end user uploads the test data and he can model it with a click of a action button.

Behind the action button, I would like to have my steps on preprocessing the data, like imputing the missing values, changing the data type, and predicting the test result.

I have already created an UI and server based code , for uploading the file from external environment.

I am struck how I could integrate it to my R model with these procedures.

Below is my R code and Server code I have tried so far.

Server code:

shinyServer(function(input,output){


  # file$datapath -> gives the path of the file
  data <- reactive({
    file1 <- input$file
    if(is.null(file1)){return()} 
    read.table(file=file1$datapath, sep=input$sep, header = input$header, stringsAsFactors = input$stringAsFactors)

  })

  output$filedf <- renderTable({
    if(is.null(data())){return ()}
    input$file
  })

  output$sum <- renderTable({
    if(is.null(data())){return ()}
    summary(data())

     })

  output$table <- renderTable({
    if(is.null(data())){return ()}
    data()
  })

  output$tb <- renderUI({
    if(is.null(data()))
      h5("Powered by", tags$img(src='RStudio-Ball.png', heigth=200, width=200))
    else
      tabsetPanel(tabPanel("About file", tableOutput("filedf")),tabPanel("Data", tableOutput("table")),tabPanel("Summary", tableOutput("sum")))
  })
})

and my R codes that i have done so far for imputation and random forest.

claim[,c(2:12,16:22,25,30,31,33,32,34)] <- lapply(claim[,c(2:12,16:22,25,30,31,33,32,34)], as.numeric)
claim[,c(1, 13:15)] <- lapply(claim[,c(1, 13:15)], as.Date, format = "%d.%m.%Y")
Missing value imputation

mice_impute = mice(New,m=5, method='rf')
Imputed_data <- mice::complete(mice_impute, 3)

Random Forest Model.

Imputed_data$claim.Qty.Accepted <- factor(Imputed_data$claim.Qty.Accepted, exclude = "")
summary(Imputed_data$claim.Qty.Accepted)
train <- Imputed_data[1:3325,]
validate <- Imputed_data[3326:3801,]
test <- Imputed_data[3801:4750,]
names(original)
fit <- randomForest(claim.Qty.Accepted~., data=train, na.action=na.exclude)
print(fit)


p1 <- predict(fit, train)
caret::confusionMatrix(p1, train$claim.Qty.Accepted)

p2 <- predict(fit, validate)
caret::confusionMatrix(p2, validate$claim.Qty.Accepted)

p3 <- predict(fit, test)
caret::confusionMatrix(p3, test$claim.Qty.Accepted)

Step by step procedure that could guide me will also be helpful.

Looking at your code, there's a number of things that you set based on one specific dataset so IMO most of the work here will probably be around figuring out the best ways to sanitize inputs based on what you think your users will need.

ie:

  • If you're letting the users upload any file, you probably want to set up a reactive feature that lets a user select the dependent (and whether to turn it into a factor or numeric) (or show a message if the data frame doesn't contain a variable called claim.Qty.Accepted.)

  • Additionally, the numeric column indices will likely not work so you should either add some way for a user to select variables (and what to do with them) or add some code that figures out the best datatype for each dataset. (IMO allowing a user to select or override is probably best, but if you want to not create extra work for the user then i would recommend applying a function that guesses the best datatype for a column and does that)

  • The train/validate/test set sizes should probably be based on %s of the number of rows in the data or something the user can select.

The cleaned version of the dataset should be a reactive object based on both uploaded data and any intermediate selections a user makes (or their defaults)

Once you have all of this this taken care of (and it's giving the results you would expect), imputing missing values and running the model can be probably put inside a reactive event and the predictions can be based on it's output.

2 Likes

Maura, for a sample dataset could you give me a sample code for me to work with it.
It would be really helpful for me to proceed further

Thanking you in advance

I don't have any code especially tailored to the purpose, but the dev center has a lot of examples (with code) that I found super helpful when I was getting started with shiny

https://shiny.rstudio.com/gallery/

The retirement simulator in particular is an example of running a model based on parameters that a user can change

1 Like