BEYOND DASHBOARD FATIGUE
SEAN LOPP, PRODUCT MANAGER & RICH IANNONE, SOFTWARE ENGINEER AT RSTUDIO
July 14, 2020 at 1pm-2pm Eastern Time
Recording (coming soon)
What You'll Learn
Data science teams face a challenging task. Not only do they have to gain insight from data, they also have to persuade others to make decisions based on those insights. To close this gap, teams rely on tools like dashboards, apps, and APIs. But unfortunately data organizations can suffer from their own success - how many of those dashboards are viewed once and forgotten? Is a dashboard of dashboards really the right solution? And what about that pesky, precisely formatted Excel spreadsheet finance still wants every week?
In this webinar, we’ll show you an easy way teams can solve these problems using proactive email notifications through the blastula and gt packages, and how RStudio pro products can be used to scale out those solutions for enterprise applications. Dynamic emails are a powerful way to meet decision makers where they live - their inbox - while displaying exactly the results needed to influence decision-making. Best of all, these notifications are crafted with code, ensuring your work is still reproducible, durable, and credible.
We’ll demonstrate how this approach provides solutions for data quality monitoring, detecting and alerting on anomalies, and can even automate routine (but precisely formatted) KPI reporting.
Questions and Answers
In writing to the csv that gets attached, is there a way to just write that CSV to memory or attach it directly to the email from a temp file that gets deleted after the script runs?
Currently, this has to use a file.
Is it possible to implement this workflow without Connect? If so, any resources to point to?
It is possible, what you need is a scheduler to run a script that contains the rmarkdown::render() call that runs the main .Rmd document. The scheduler might be cron on POSIX systems and Windows Task Scheduler on Windows. Then, when setting up the scheduled task, use Rscript or littler to execute the script (see https://stackoverflow.com/questions/10116411/ for details).
This is great. Will it always fall on the data science team to set the triggers and thresholds, or is there some possibility of enabling decisionmakers to tweak those thresholds?
I have some ideas for this. One is having a shared Google sheet available that contains the thresholds. This can be read into the .Rmd file through the googlesheets4 package (https://github.com/tidyverse/googlesheets4/). Another option is to create a simple Shiny app with UI for setting the threshold values. This will write to a text file that the .Rmd file will read each time. If getting serious about thresholds and such, it might be better to use a package like pointblank (https://github.com/rich-iannone/pointblank) which has expanded options for thresholds and failure conditions, and, reporting objects that can be included in emails.
How is this approach influencing your future creation of dashboards? I'm thinking the email reduces need to see the dashboard over time.
It probably depends on the data culture of the organization. If there are enough needs being served by notifications instead of viewing dashboards then possibly less dashboards need to be created.
Can this workflow be used from RServer or only R Desktop?
This can be used from RStudio Server as well as Desktop.
Have you found that sending notifications drives more traffic to the dashboard, or do decision makers just rely on the email?
I think that decision makers routinely look at a small subset of the most high-level dashboards. If these dashboard has been carefully crafted to contain information most relevant to the decision makers, they will be viewed pretty much every day. However, the email when sent may shed more light as to exactly what is of concern and why (perhaps the thresholds/targets aren’t indicated in the dashboard). Also, having an email removes any remaining doubt that there is a problem since it is process-driven, with agreed upon thresholds.
What is triggering the build of the "subdocument"? Just the "_email" addition in the name and the use of blastula?
It is the inclusion of the filename in the
render_connect_email(input = “business_health_email.Rmd") function call at the bottom of the main .Rmd file that triggers the build of the subdocument.
When your conditional report is generated and you have a figure and a table, would it be possible to generate a dynamic/interactive HTML on the fly and then link to that in the email?
Is it possible to control data access based on the email recipient? For instance, in retail only giving certain users access to certain stores?
If there is a link from the email to a Shiny dashboard in Connect, then only authorized viewers can see the data. For personalization, another option is to point to a Shiny dashboard that has authentication (https://shiny.rstudio.com/gallery/authentication-and-database.html).
Can you set up a query script that pulls updated data every - say hour? Do you have an example.
Yes, this is possible. In the Connect example shown in the webinar, I set up the report to ‘refresh’ every day at a certain time. However, this can be set up to run every hour. If the data changes often (i.e., new rows are being added very frequently) then the report variables/tables/charts should reflect the data that’s pulled.
Any requirements on the blastula vs. analysis code rmd's relative locations in directory?
As in the examples, so long as everything is kept in one directory there shouldn’t be any path confusion.
What are the best practices for storing email credentials when using blastula? E.g., what if you wanted to use AWS to execute R code that is using the blastula package?"
Then you won’t be using the blastula functions shown in the webinar. Instead you would use
compose_email() (the .Rmd is
body argument) and likely
smtp_send(). For the
credentials argument of
smtp_send() you can use the
creds_envvar() function that allows the password to be retrieved from an environment variable.
Does this work only if you are using R Studio Connect?
No, you can also use
smtp_send(). They have additional requirements (like, you need to have access to an SMTP server) but you can use .Rmd files as input, any HTML you want to create, or,
Will this flow work with Outlook 2016?
If you mean ‘does email sent to Outlook 2016 look okay?’ then yes.
Does gt work when knitting to LaTeX?
It has a few issues right now but we hope to address this output format later in the year.
Can you link to multiple separate RMD dashboards into a single email?
Unfortunately, the footer text only links to a single document on Connect. But you could have links to multiple dashboard pages in that document.
Are there any resources to help setup blastula with corporate email accounts? It's a bit hit and miss for the data scientists in my area at the moment if they can get it to work
In a previous role, I had IT set me up with a special email account for sending through the company’s SMTP server. It was pretty rock solid because we didn’t throttle our own emails. Otherwise, a good recommendation for an SMTP service is SMTP2GO.
Which email client would this work with? And will it only work with Connect, as in the email configured in Connect?
The blastula package works with a wide variety of email clients; you don't have to use Connect, the blastula package has functions for sending email via a SMTP server, but those weren't covered in the webinar presentation.
Would it be possible to include a link back to the dashboard, but in a state (dependent on certain user inputs) which triggered the alert?
This is possible with a Shiny-based dashboard. Create a link in the email to a dashboard that has a bookmarking state. Take a look at https://shiny.rstudio.com/articles/bookmarking-state.html for more info on this.
What are the RStudio Connect alternatives that were just mentioned?
The key pattern is to automate the rendering of an R Markdown document; so you can use any type of scheduling tool such as cron or airflow.
Do I need to have a subscription or similar for me to do something like this?
The subscription to Connect definitely takes a lot of work and worry off your hands but it isn’t strictly necessary. It can be done with
smtp_send(), access to a machine that is always on and gives you access to cron. Take a look at the some of the answers to similar questions for a few more details on the necessary ingredients.
Creating a dashboard might cost quite some time. That is, if it isn't THE most important part of a report, for example, it's often canceled. How would you avoid this? That is, how would you make sure that cool dashboards can be reused/recycled? Normalised data input?
If I’m getting this right, I would try to separate the creation of the data for dashboards from the code that is used to generate the dashboards. Perhaps it’s highly aggregated data. That’s great for reports and for dashboards. And if it’s being made just once, there’s less change of similarly aggregated data being slightly different across different data products.
So.. I am extracting information from a dashboard regularly. The dashboard is developed by a govt. agency. Can I get benefitted from what you are showing here? Or, do I need to have access to the dashboard information first and then I can get advantage of this.
You might need to extract that data in the form you need first. At least that would make things a bit easier since it seems like it’s two distinctly separate processes. Once the data is cleaned up and some QA checks are in place, I would use that data to generate the report (that perhaps results in some email notification, depending on your criteria).
Maybe interesting for the group - but I'm using this process, but instead of emails, I'm sending it through a slack webhook, which notifies users/channels directly in Slack
That’s great! Same sort of principle applies and if Slack is a more effective venue for notifications, that’s all the better!
I suppose this scheme can be used for creating alerts to a dashboard maintainer for data quality issues, including completeness, consistency, etc.
Definitely. And if you’re interested in a package that does data quality checks, I developed one called pointblank (https://github.com/rich-iannone/pointblank). What’s helpful there is that a gt table report is generated from the validation workflow and it can be embedded in the email (showing you what went wrong, and, the extent to which data quality was above failure thresholds).
Many organizations are moving away from email, and toward a google-docs type infrastructure. Can alerting work well in that environment?
I think so. There is a wonderful package that allows for the creation of Google Sheets (called googlesheets4, https://github.com/tidyverse/googlesheets4/). It allows you to read/write sheets, so, you could imagine a scheme where a spreadsheet could be updated with important information based on some condition.
Wondering if there's a reason why the thresholds aren't shown on the dashboard itself? For the tables, could the important information be highlighted like it is on the email?
That can certainly be done. Even so, the main limitation with this dashboards-first/-only approach is that you have to diligently visit the dashboard in the first place.
How can I designate groups of users to send the email to.
This is configurable in the Access view in Connect. You can set a number of collaborators and watchers. Then, in the Schedule view (also in the Connect settings for the document), you choose a subset of those persons to send email to.
How do you make sure that results are formatted correctly in every email program?
We did a lot of QA testing with Litmus (https://www.litmus.com), a really great service for email testing. We were able to get email inbox results from upwards of 50 different email clients (web clients, desktop clients, mobile device clients, etc.). Because of this, we were able to rapidly iterate on fixing display issues in some problem clients and work towards making our HTML templating rock solid.
What's the best way to deliver interactive htmlwidgets by email?
Can recipients answer the email so you can have a conversation about the data?
In Connect, there is an automatic ‘reply-to’ address set (with the document-owners address) with all outgoing emails. So, yes!
Are there plans to get blastula to work with gmail without the "allow less secure apps" options? We work with healthcare data, so we can't allow that option ever.
Gmail is quite restrictive with their SMTP service. However, as an alternative, I wholeheartedly recommend the SMTP2GO service. Their entire service is about making it easy to send emails without throttling and other shenanigans. I’ve tested their service and I’m very impressed with it. It has a generous free tier and they do have customer security in mind (https://www.smtp2go.com/faq/).
Can you send this report using blastula from a microsoft outlook inbox?
Yes, you can use Outlook's SMTP service but you'll have to follow a different workflow than what was shown in the webinar. Take a look at some of the answers to similar questions for what’s required.
How difficult would it be to send emails to different groups conditionally? In this example, could an email be sent to one group if the DAC KPI needed attention and another group if the churn KPI needed attention?
For this use case, you'd want to create different versions of the report for each group. This can be done easily with parameterized R Markdown reports in Connect; or using the blastula
smtp_send() function outside of Connect.
Instead of hardcoding thresholds, they can be dynamic in nature, right? So f.e. only send email notification if yesterday's revenue is 1.5 sd below the mean.
Of course! For this sort of scenario, I would calculate the dynamic threshold with a little dplyr, pull the value, and use that in the conditional statement.
Would be amazing to send text notifications. Even possible?
You mean text messages? Seems totally possible with the twilio R package (https://github.com/seankross/twilio). Take it to the limit: send a text about the email about the report!
Thank you!! This was amazingly relevant to me. I appreciate your time to produce this. It probably literally changed my life.
That is fantastic to hear! Glad you found this so useful. Please let us know how you end up using this.
Can be connect be deployed completely on-premise?
Yes, Connect can be deployed complete on-premise.
Great stuff! Does blastula require RStudio Connect? Can it be used without it? If so, what functionality requires RSConnect and what functionality is available without it?
The blastula package can be used without Connect. It’s just that Connect makes the instrumentation and scheduling much easier. Check out some replies to similar questions about ways to do something similar without Connect.
Does blastula let you send an email from within R without needing RStudio Connect?
Yes, but Connect is so great in terms of ‘set it and forget it’ that you may not want to. If you must forego the awesomeness of Connect then take a look at some of my replies to similar questions.
Hi, is there a way to automate periodic email reporting using blastula & a free R package instead of RStudio Connect? Many thanks.
You can use a computer that is always on (or, at least dependably on and available at the scheduled time(s)) to automate the sending of blastula emails. Key is to use Rscript or littler in tandem with cron/crontab.
How long does the RStudio Connect Free Trial last?
The RStudio Connect Free Trial lasts 45 days. More info, and a link to get the software, available here: RStudio Connect - RStudio
Is the email features only available in RSConnect? Any chance these might become available for future Rstudio Cloud paid accounts?
Sending emails doesn't require Connect (as was answered live in the Webinar), but Connect is a great platform to do so. We are considering adding new features to RStudio Cloud in the future, so thank you for that suggestions.