Follow-Up Thread - Beyond Dashboard Fatigue

rich_i · July 14, 2020, 5:56pm

BEYOND DASHBOARD FATIGUE

SEAN LOPP, PRODUCT MANAGER & RICH IANNONE, SOFTWARE ENGINEER AT RSTUDIO

July 14, 2020 at 1pm-2pm Eastern Time

Recording (coming soon)

Github Repo

What You'll Learn

Data science teams face a challenging task. Not only do they have to gain insight from data, they also have to persuade others to make decisions based on those insights. To close this gap, teams rely on tools like dashboards, apps, and APIs. But unfortunately data organizations can suffer from their own success - how many of those dashboards are viewed once and forgotten? Is a dashboard of dashboards really the right solution? And what about that pesky, precisely formatted Excel spreadsheet finance still wants every week?

In this webinar, we’ll show you an easy way teams can solve these problems using proactive email notifications through the blastula and gt packages, and how RStudio pro products can be used to scale out those solutions for enterprise applications. Dynamic emails are a powerful way to meet decision makers where they live - their inbox - while displaying exactly the results needed to influence decision-making. Best of all, these notifications are crafted with code, ensuring your work is still reproducible, durable, and credible.

We’ll demonstrate how this approach provides solutions for data quality monitoring, detecting and alerting on anomalies, and can even automate routine (but precisely formatted) KPI reporting.

Questions and Answers

In writing to the csv that gets attached, is there a way to just write that CSV to memory or attach it directly to the email from a temp file that gets deleted after the script runs?

Currently, this has to use a file.

Is it possible to implement this workflow without Connect? If so, any resources to point to?

It is possible, what you need is a scheduler to run a script that contains the rmarkdown::render() call that runs the main .Rmd document. The scheduler might be cron on POSIX systems and Windows Task Scheduler on Windows. Then, when setting up the scheduled task, use Rscript or littler to execute the script (see Schedule R script using cron - Stack Overflow for details).

This is great. Will it always fall on the data science team to set the triggers and thresholds, or is there some possibility of enabling decisionmakers to tweak those thresholds?

I have some ideas for this. One is having a shared Google sheet available that contains the thresholds. This can be read into the .Rmd file through the googlesheets4 package (GitHub - tidyverse/googlesheets4: Google Spreadsheets R API (reboot of the googlesheets package)). Another option is to create a simple Shiny app with UI for setting the threshold values. This will write to a text file that the .Rmd file will read each time. If getting serious about thresholds and such, it might be better to use a package like pointblank (GitHub - rstudio/pointblank: Data quality assessment and metadata reporting for data frames and database tables) which has expanded options for thresholds and failure conditions, and, reporting objects that can be included in emails.

How is this approach influencing your future creation of dashboards? I'm thinking the email reduces need to see the dashboard over time.

It probably depends on the data culture of the organization. If there are enough needs being served by notifications instead of viewing dashboards then possibly less dashboards need to be created.

Can this workflow be used from RServer or only R Desktop?

This can be used from RStudio Server as well as Desktop.

Have you found that sending notifications drives more traffic to the dashboard, or do decision makers just rely on the email?

I think that decision makers routinely look at a small subset of the most high-level dashboards. If these dashboard has been carefully crafted to contain information most relevant to the decision makers, they will be viewed pretty much every day. However, the email when sent may shed more light as to exactly what is of concern and why (perhaps the thresholds/targets aren’t indicated in the dashboard). Also, having an email removes any remaining doubt that there is a problem since it is process-driven, with agreed upon thresholds.

What is triggering the build of the "subdocument"? Just the "_email" addition in the name and the use of blastula?

It is the inclusion of the filename in the render_connect_email(input = “business_health_email.Rmd") function call at the bottom of the main .Rmd file that triggers the build of the subdocument.

When your conditional report is generated and you have a figure and a table, would it be possible to generate a dynamic/interactive HTML on the fly and then link to that in the email?

Yes, you can use a table format in the main document that has controls for sorting, filtering, pagination, etc. (This shouldn’t be done in the email subdocument, since email clients won’t be able to render HTML content that has JavaScript.)

Is it possible to control data access based on the email recipient? For instance, in retail only giving certain users access to certain stores?

If there is a link from the email to a Shiny dashboard in Connect, then only authorized viewers can see the data. For personalization, another option is to point to a Shiny dashboard that has authentication (https://shiny.rstudio.com/gallery/authentication-and-database.html).

Can you set up a query script that pulls updated data every - say hour? Do you have an example.

Yes, this is possible. In the Connect example shown in the webinar, I set up the report to ‘refresh’ every day at a certain time. However, this can be set up to run every hour. If the data changes often (i.e., new rows are being added very frequently) then the report variables/tables/charts should reflect the data that’s pulled.

Any requirements on the blastula vs. analysis code rmd's relative locations in directory?

As in the examples, so long as everything is kept in one directory there shouldn’t be any path confusion.

What are the best practices for storing email credentials when using blastula? E.g., what if you wanted to use AWS to execute R code that is using the blastula package?"

Then you won’t be using the blastula functions shown in the webinar. Instead you would use compose_email() (the .Rmd is body argument) and likely smtp_send(). For the credentials argument of smtp_send() you can use the creds_envvar() function that allows the password to be retrieved from an environment variable.

Does this work only if you are using R Studio Connect?

No, you can also use compose_email() and smtp_send(). They have additional requirements (like, you need to have access to an SMTP server) but you can use .Rmd files as input, any HTML you want to create, or, blocks().

Will this flow work with Outlook 2016?

If you mean ‘does email sent to Outlook 2016 look okay?’ then yes.

Does gt work when knitting to LaTeX?

It has a few issues right now but we hope to address this output format later in the year.

Can you link to multiple separate RMD dashboards into a single email?

Unfortunately, the footer text only links to a single document on Connect. But you could have links to multiple dashboard pages in that document.

Are there any resources to help setup blastula with corporate email accounts? It's a bit hit and miss for the data scientists in my area at the moment if they can get it to work

In a previous role, I had IT set me up with a special email account for sending through the company’s SMTP server. It was pretty rock solid because we didn’t throttle our own emails. Otherwise, a good recommendation for an SMTP service is SMTP2GO.

Which email client would this work with? And will it only work with Connect, as in the email configured in Connect?

The blastula package works with a wide variety of email clients; you don't have to use Connect, the blastula package has functions for sending email via a SMTP server, but those weren't covered in the webinar presentation.

Would it be possible to include a link back to the dashboard, but in a state (dependent on certain user inputs) which triggered the alert?

This is possible with a Shiny-based dashboard. Create a link in the email to a dashboard that has a bookmarking state. Take a look at Shiny - Bookmarking state for more info on this.

What are the RStudio Connect alternatives that were just mentioned?

The key pattern is to automate the rendering of an R Markdown document; so you can use any type of scheduling tool such as cron or airflow.

Do I need to have a subscription or similar for me to do something like this?

The subscription to Connect definitely takes a lot of work and worry off your hands but it isn’t strictly necessary. It can be done with compose_email(), smtp_send(), access to a machine that is always on and gives you access to cron. Take a look at the some of the answers to similar questions for a few more details on the necessary ingredients.

Creating a dashboard might cost quite some time. That is, if it isn't THE most important part of a report, for example, it's often canceled. How would you avoid this? That is, how would you make sure that cool dashboards can be reused/recycled? Normalised data input?

If I’m getting this right, I would try to separate the creation of the data for dashboards from the code that is used to generate the dashboards. Perhaps it’s highly aggregated data. That’s great for reports and for dashboards. And if it’s being made just once, there’s less change of similarly aggregated data being slightly different across different data products.

So.. I am extracting information from a dashboard regularly. The dashboard is developed by a govt. agency. Can I get benefitted from what you are showing here? Or, do I need to have access to the dashboard information first and then I can get advantage of this.

You might need to extract that data in the form you need first. At least that would make things a bit easier since it seems like it’s two distinctly separate processes. Once the data is cleaned up and some QA checks are in place, I would use that data to generate the report (that perhaps results in some email notification, depending on your criteria).

Maybe interesting for the group - but I'm using this process, but instead of emails, I'm sending it through a slack webhook, which notifies users/channels directly in Slack

That’s great! Same sort of principle applies and if Slack is a more effective venue for notifications, that’s all the better!

I suppose this scheme can be used for creating alerts to a dashboard maintainer for data quality issues, including completeness, consistency, etc.

Definitely. And if you’re interested in a package that does data quality checks, I developed one called pointblank (GitHub - rstudio/pointblank: Data quality assessment and metadata reporting for data frames and database tables). What’s helpful there is that a gt table report is generated from the validation workflow and it can be embedded in the email (showing you what went wrong, and, the extent to which data quality was above failure thresholds).

Many organizations are moving away from email, and toward a google-docs type infrastructure. Can alerting work well in that environment?

I think so. There is a wonderful package that allows for the creation of Google Sheets (called googlesheets4, GitHub - tidyverse/googlesheets4: Google Spreadsheets R API (reboot of the googlesheets package)). It allows you to read/write sheets, so, you could imagine a scheme where a spreadsheet could be updated with important information based on some condition.

Wondering if there's a reason why the thresholds aren't shown on the dashboard itself? For the tables, could the important information be highlighted like it is on the email?

That can certainly be done. Even so, the main limitation with this dashboards-first/-only approach is that you have to diligently visit the dashboard in the first place.

How can I designate groups of users to send the email to.

This is configurable in the Access view in Connect. You can set a number of collaborators and watchers. Then, in the Schedule view (also in the Connect settings for the document), you choose a subset of those persons to send email to.

How do you make sure that results are formatted correctly in every email program?

We did a lot of QA testing with Litmus (https://www.litmus.com), a really great service for email testing. We were able to get email inbox results from upwards of 50 different email clients (web clients, desktop clients, mobile device clients, etc.). Because of this, we were able to rapidly iterate on fixing display issues in some problem clients and work towards making our HTML templating rock solid.

What's the best way to deliver interactive htmlwidgets by email?

Unfortunately most email clients are not compatible with JavaScript; so htmlwidgets tends to be a bad fit for embedding directly in emails. But, you can send animated gifs and animated PNGs (with work stuff in it) which can be used to present some useful information (you have to get really creative to make this work well though).

Can recipients answer the email so you can have a conversation about the data?

In Connect, there is an automatic ‘reply-to’ address set (with the document-owners address) with all outgoing emails. So, yes!

Are there plans to get blastula to work with gmail without the "allow less secure apps" options? We work with healthcare data, so we can't allow that option ever.

Gmail is quite restrictive with their SMTP service. However, as an alternative, I wholeheartedly recommend the SMTP2GO service. Their entire service is about making it easy to send emails without throttling and other shenanigans. I’ve tested their service and I’m very impressed with it. It has a generous free tier and they do have customer security in mind (Frequently Asked Questions - SMTP2GO).

Can you send this report using blastula from a microsoft outlook inbox?

Yes, you can use Outlook's SMTP service but you'll have to follow a different workflow than what was shown in the webinar. Take a look at some of the answers to similar questions for what’s required.

How difficult would it be to send emails to different groups conditionally? In this example, could an email be sent to one group if the DAC KPI needed attention and another group if the churn KPI needed attention?

For this use case, you'd want to create different versions of the report for each group. This can be done easily with parameterized R Markdown reports in Connect; or using the blastula smtp_send() function outside of Connect.

Instead of hardcoding thresholds, they can be dynamic in nature, right? So f.e. only send email notification if yesterday's revenue is 1.5 sd below the mean.

Of course! For this sort of scenario, I would calculate the dynamic threshold with a little dplyr, pull the value, and use that in the conditional statement.

Would be amazing to send text notifications. Even possible?

You mean text messages? Seems totally possible with the twilio R package (GitHub - seankross/twilio: ☎ An interface to the Twilio API for R). Take it to the limit: send a text about the email about the report!

Thank you!! This was amazingly relevant to me. I appreciate your time to produce this. It probably literally changed my life.

That is fantastic to hear! Glad you found this so useful. Please let us know how you end up using this.

Can be connect be deployed completely on-premise?

Yes, Connect can be deployed complete on-premise.

Great stuff! Does blastula require RStudio Connect? Can it be used without it? If so, what functionality requires RSConnect and what functionality is available without it?

The blastula package can be used without Connect. It’s just that Connect makes the instrumentation and scheduling much easier. Check out some replies to similar questions about ways to do something similar without Connect.

Does blastula let you send an email from within R without needing RStudio Connect?

Yes, but Connect is so great in terms of ‘set it and forget it’ that you may not want to. If you must forego the awesomeness of Connect then take a look at some of my replies to similar questions.

Hi, is there a way to automate periodic email reporting using blastula & a free R package instead of RStudio Connect? Many thanks.

You can use a computer that is always on (or, at least dependably on and available at the scheduled time(s)) to automate the sending of blastula emails. Key is to use Rscript or littler in tandem with cron/crontab.

How long does the RStudio Connect Free Trial last?

The RStudio Connect Free Trial lasts 45 days. More info, and a link to get the software, available here: Posit Connect - Posit

Is the email features only available in RSConnect? Any chance these might become available for future Rstudio Cloud paid accounts?

Sending emails doesn't require Connect (as was answered live in the Webinar), but Connect is a great platform to do so. We are considering adding new features to RStudio Cloud in the future, so thank you for that suggestions.

Cecinerock · July 14, 2020, 6:06pm

Hi,

The webinar was very interesting.

Would it be possible to know if is there a way to automate periodic email reporting from RStudio or a Shiny app using blastula (& gt for formated DT) and a free R package?

Many thanks

EconomiCurtis · July 17, 2020, 3:46pm

You can use a computer that is always on (or, at least dependably on and available at the scheduled time(s)) to automate the sending of blastula emails. Key is to use Rscript or littler in tandem with cron/crontab.

john.smith · July 23, 2020, 6:51am

Hi,

I just recently watched this and it looks very interesting
Normally I would use RDCOMClient but would have to create a rather shabby HTML body to insert into the email.

I work with a large corporation and we use exchange. Does blastula have the capability to work with exchange. Is there a setup on my local machine i can implement?

Thanks

rich_i · July 28, 2020, 4:52pm

We don't have an Exchange environment set up here to verify that it can be done, but, it seems like it should be possible based on this: https://stackoverflow.com/questions/59000292/sending-mail-with-curl-on-windows-via-ms-exchange-server

In that SO thread, curl at the command line is being used to send email using an SMTP server provided through Exchange. The blastula package uses curl to ultimately send the email through SMTP, and you would have to specify the SMTP settings and credentials through one of blastula's credentials functions (e.g., creds(), creds_envvar(), etc.). To help with this, try sending out a test message with prepare_test_message() %>% smtp_send(...) (the verbose = TRUE option in smtp_send() will be helpful to debug, giving you messages emitted from curl).

john.smith · July 29, 2020, 2:36pm

Thank you @rich_i

I shall give it a try and let you know

EconomiCurtis · August 25, 2020, 6:05pm

This topic was automatically closed after 42 days. New replies are no longer allowed.