Omics BioAnalytics: Reproducible Research using R Shiny and Alexa - 2020 Shiny Contest Submission

Omics BioAnalytics

Authors: Amrit Singh
Working with Shiny more than 1 year

Abstract: Omics BioAnalytics is a web/voice application that allows for the interrogation of multiple high dimensional biological datasets such as proteomics and transcriptomics in order to identify key predictors of disease. Features include commonly performed analyses such as descriptive statistics and inference of metadata of biological samples, exploratory data analysis, differential expression analysis and biomarker discovery analysis. Dynamic reporting is implemented to allows users to create reports of their results. Users can either uploaded their data and use the web-app to generate interactive visualizations or submit the analysis to Alexa such that they are given a 7-digit code. The 7-digit code can then be used to access their analysis on an Alexa device and explore their data using voice commands. Omics BioAnalytics is an initiative to reduce the barrier of entry for non-specialists in bioinformatics using both web- and voice-based analytics. The use of dynamic dashboards, dynamic reporting, interactive visualizations and voice-enabled analytics using various bioinformatics tools will enable and expedite a thorough interrogation of omics datasets.

Full Description: I learned about the Shiny contest a week before the March deadline and decided it would be an excellent opportunity to demonstrate a proof of concept of another app I am developing called, Omics Central (https://github.com/singha53/singha53.github.io/blob/source/src/assets/img/portfolio/omicsCentral_poster.pdf). R Shiny is what really motivated me to become a developer and learn industry standards such as React and AWS. Omics Central uses a serverless stack; the React front-end is hosted on S3, whereas the backend is completely based on AWS services. Its need really arose from bioinformatics analysis that are compute intensive such as hyperparameter tuning for machine learning models. Therefore, users simply upload their data and select the analysis they want to run and submit the job, which runs Docker containers consisting of R/Python scripts and the results are saved to DynamoDB and S3. Then it’s a matter of querying the databases and visualizing the plots. This app is under construction…

For this contest I decided to recycle an app I made for a publication 2 years ago which really got my journey as a developer rolling (https://amritsingh.shinyapps.io/multiomics_HFhospitalizations/). The purpose of this Shiny app was to allow readers to reproduce the results of my paper (Singh et al., CJC 2019). I wanted to just extend the app so that it would take in user data and run through the same set of analysis. However, one thing led to another and given the two-week extension, I was able to extend the app in a number of ways:

  1. I wanted to make it easier to write up reports using the app so I add a dynamic reporting feature (the shiny gallery was very helpful), where users can create different sections of documents (e.g. introduction, methods etc) and add figures and tables and generate a word document that they can share.
  2. Originally the app was designed for two-group comparisons, however, I came across a COVID-19 transcriptomics dataset which motivated me to extend some features of the app such as statistical inference and differential expression analysis to handle more than two categories. I didn’t have enough time to extend the biomarker discovery analysis to multinomial models, so that is limited to binary classification, although users can decide which two groups from a categorical variable to develop models for. Since this was a COVID-19 transcriptomics dataset, I add drug enrichment functionality to allow for the enrichment of drug candidates that can reverse the gene expression signatures. Potential treatments? Maybe 
  3. I have always wanted to see a ggplot on an Alexa device. Therefore, I made a multimodal Alexa Skill to explore the data similar to the web app but by using voice-commands. I have made an Alexa Skill previously as a cool project with my daughter (https://www.amazon.com/SinghIsKing-Kenza-Travels/dp/B08252TBW9), so this wasn’t too difficult. The cloudyr R-library was useful in sending data to S3 and DynamoDB from R. Currently, this is a very simple Alexa Skill, but can be made much more complex in terms of the voice commands it recognises and types of analyses it can do. This skill does a lot querying from S3 and DynamoDB so I have not submitted it for certification (otherwise I have to incur the cost for use), however I have provided the source code and instructions so users can set it up themselves (https://github.com/singha53/omics-bioanalytics-alexa-skill).

Given all this work, I decided to write it up as a short manuscript.


Category: Research
Keywords: reproducible research, omics, data analytics, alexa, aws, bioinformatics, biomarkers, dynamic reports, machine learning
Shiny app: https://amritsingh.shinyapps.io/omicsBioAnalytics/
Repo: https://github.com/singha53/omicsBioAnalytics
RStudio Cloud: https://rstudio.cloud/project/1107591

Thumbnail:
image

Full image:

Video demos

Omics BioAnalytics web app demo using heart failure multi-omics data

Omics BioAnalytics web app demo using COVID19 transcriptomics data

Omics BioAnalytics Multimodal Alexa Skill demo using heart failure multi-omics data