Operationalizing algorithms using Shiny and Flask
Presentation by Shatrunjai Singh
You can view the recording here on - YouTube
An algorithm is only as valuable as its adoption. Speed to value, repeatability, and low-cost solutions can dramatically reduce software and services budgets and free up valuable dollars for other activities. Open-source tools such as Shiny (R) and Flask (Python) have enabled the creation and deployment of data science-based web applications convenient and manageable. In the healthcare data science world, we routinely wrap sophisticated statistical code into such web-based point-and-click solutions. In this talk, you will learn about real-life examples of how one can rapidly operationalize intricate algorithms using web app frameworks.
Shatrunjai ‘Jai’ Singh is a Lead Data Scientist at Aetna, a CVS Health Company and specializes in data mining, predictive modeling, and data visualization. His work has received several awards including the American Heart Association and the Epilepsy Foundation. He won the Tableau Chart-Champion in 2016 and was included in the ’40 under 40’ for innovation by LIMRA International.
Thanks all for joining!
How do you set up a quick tour of the Shiny app?
Have you encountered scenarios where you want to run clustering on big data sets (upwards of 10 million rows or 5gb on disk) from within Shiny?
- Most of the Shiny apps that I’ve shown you today tend to run on smaller datasets. For bigger datasets, Flask/Django-based apps with Python perform better. There are parallelization and just-in-time compiling methods which can be useful for the same.
This is a great application! What kind of process do you use for updating the application, do you have any sort of defined deployment process?
- The deployment process that we use is through RStudio Connect. We have a production version and a dev version. The process that we use is that first we build a very simple version of the app that just does the basic analysis and then we upload it to the development server. We give it to a small number of people and then we perform “test and learn”. We would ask them to use it in a way that they would traditionally use. It gives us hints on where they got stuck and what could be improved. This usually takes one month and then we get all the feedback from them, we try to put in most of the critical feedback into the dev tool and then we launch it to a slightly different bigger population and if there are no big hiccups then we move this from the dev version to the prod version where we launch it for the entire company. We re-calibrate our analysis every 1.5 months early on and after the first 6-months we use a 6-month iterative calibration cycle. We see, has the data changed, have some of the packages changed, is the app working as it was intended to work as it initially was designed, but that’s the traditional cycle that we use.
Why in your opinion Flask in comparison with Shiny is not great for easy data visualisation?
- Flask is great for data visualization but lacks the wide variety of premade packages that are available in R-shiny todo everything under the sun. This might also be a personal preference.
How do you position shiny over Tableau? what would you use Tableau for and when do you decide what has to be in Shiny?
- That’s a great question. A little background - I was a big Tableau user, I actually won the Tableau Massachusetts competition three years ago so I was a big Tableau user but Tableau has limitations. The one limitation is that you cannot do advanced analytics in Tableau. You can do visualizations, you can slice and dice data. The most advanced data science you can do in Tableau right now is regression. You can fit a simple line to your data set. If you want to use some of the algorithms, for example if you want to use the segmentation or to comorbidity association rules I’ve just used here then Tableau can’t do that. Tableau can make it look pretty but to do that analysis you will have to use R or Python. With Shiny or Flask you can use the latest and greatest data science analytics along with visualizations. The visualizations may not look as neat as Tableau because it’s designed by professionals but it will still be more advanced and you can do these analyses which are more complicated.
Any challenge in building shiny apps to scale it well? And how do you manage it?
- Apps that scale need to be built in a modular fashion. Each element should only perform the base minimal when required. I’ve seen that using ‘shinytest’ for me significantly accelerates scalability
On how many insured persons are these analyses based on?
- Most big US health insurance companies have ~2 million insured members at any given point of time. These apps were built on this data.
Could you please link the github or a resource for the chat bot?
- For the apps I cannot share the code because it belongs to the company now, but I have a lot of different other Shiny apps that use very similar methodologies, also different Tableau dashboards and Flask apps as well. shatrunjai (Shatrunjai Singh) · GitHub
Interested in the chat module. Can we have real-time customer service through this tool? Or are the answers just based on NLP?
- I have not done that because I was trying to build this app and I was hoping to automate all of the questions that people would have. From Dialogflow in Google, you can also add third-party integration and if a customer asks a question which does not match with all the pre-listed questions then it can ping you on a certain address and let you know that you should transfer this person to a live rep.
The overviews say that the EDAs have some component of missing data, can you show that in the shiny apps?
- I used the package DataExplorer to visualize and show missing data
Could you discuss what the process looks like for you to publish code in github from CVS? Do you go through an approval process and could you touch on that?
- We have our own internal github at CVS that we use. We do not publish our code on an external repository (obvious reasons)
Could you talk about the bias detection app - is there public info?
- There is a lot of free public information out there. IBM came up with something called IBM AI Fairness 360 which is a comprehensive set of tools built by researchers and professors at universities, a very extensive package - both an R and Python package. This package tends to be confusing because there are so many details for a user, I would be a little bit confused about what metrics to use. Google has come out with their own fairness package which is a web application where you can upload your data set and it can tell you if there is a race or gender bias in your data set or not. So there are different methodologies out there. For your own specific use case, I would say you would have to study the material and then come up with what is the most useful to you but there is a lot of literature out there.
What is the front end packages you used to build EQUAL? The interface doesn't look like Bootstrap themes in a typical shiny app.
Can you speak briefly on your approach to addressing missing data?
- In our apps we use a mean/median imputation for most purposes though random forest based predictions on ‘data not missing at random’ are also frequently used. We also drop features which might have more than 15% of the data missing