This is a companion discussion topic for the original entry at https://blog.rstudio.com/2020/11/12/cloud-strategy
Why Do Organizations Want to Move to the Cloud?
There are many reasons why organizations are looking to use cloud services more widely for data science. They include:
- Long delays and high startup costs for new data science teams: When you bring a new team of data scientists onboard, it can be costly and time consuming to spin up the necessary hardware for the team. New hardware might be needed for developing data science analyses or for sharing interactive Shiny applications for stakeholders. These burdens tend to fall either on the individual data scientists or on DevOps and IT administrators who are responsible for configuring servers.
- Obstacles to collaboration between organizations or groups: If a team is restricted to operating within their organization’s firewall, it can be very difficult to support collaboration or instruction between groups that don’t normally interact with each other. For example, running a data science workshop or statistics class can be unwieldy if everyone is working within their own separate environments.
- High costs of computing infrastructure: Another key challenge is the potentially high costs of setting up and maintaining an organization’s computing infrastructure, including both hardware and software. These costs include the initial investments, maintenance and upgrade fees, and the related manpower costs.
- Difficulty scaling to meet variable demand: Scaling server resources to satisfy highly variable data science demands can be very difficult because organizations rarely maintain excess capacity. For example, an organization may want to publish a news article or a COVID dashboard for which they expect high demand, only to discover that it needs the IT organization to spin up a back-end Kubernetes cluster to handle the load.
- Excessive time and costs moving the data to the analysis: If an organization’s data is already stored on one of the major cloud providers or in a remote data center, moving that data to your laptop for analysis can be slow and expensive. Ideally, you should perform the data access, transformation and analysis as close to where the data lives as possible. Not doing so could subject you to excessive data transfer charges to move the data.
Let Your Data Science Goals Drive Your Cloud Strategy
Depending on the circumstances of your organization and what specific challenges you are trying to address, you should consider four possible options for your data science cloud strategy:
- Hosted and Software as a Service (SaaS) offerings: A fully hosted service can minimize the cost and time required to start up a new project. However, functionality may be limited compared to on premise offerings and integration with your internal data and infrastructure can be challenging.
- Deployment to a Virtual Private Cloud (VPC) provider: Deploying software on a major cloud platform such as Amazon Web Services (AWS) or Azure can provide the full flexibility and customization of on premise software. However, setting up a virtual private cloud application often requires more management overhead to integrate with your internal systems as well as careful administration of usage to avoid unexpected usage charges.
- Cloud marketplace Offerings: Pre-built applications offered on services such as the AWS and Azure Marketplaces make it easy to get started at a pay-as-you-go hourly cost, but require careful management to ensure the software is available and running only when needed.
- Data science in your data lake: By embedding your data science tools into your existing data platform, your computations can be run close to the data, minimize overhead, and easily tie into your data pipeline. However, this adds additional complexity and potential limitations.
We’re provided the table below to help you assess the various RStudio cloud offerings. It matches up problems and potential solutions with specific RStudio options and resources to consider. The options are arranged in order of increasing complexity of configuration and administration.
Table 1: Summary of Cloud Options for RStudio SoftwareProblem | Potential Solution | Pros and Cons | Options to consider |
---|---|---|---|
Simplify and reduce startup costs | SaaS/Hosted offering | Pros:
Cons:
|
Create data science analyses with RStudio Cloud
Share Shiny applications with shinyapps.io
Manage packages with RStudio Public Package Manager, a free service to provide easy installation of package binaries, and access to previous package versions
|
Promote collaboration or instruction between organizations or groups | SaaS/Hosted offering | Pros:
Cons:
|
Share projects or teach classes/workshops with RStudio Cloud |
Mitigate high costs of computing infrastructure | Marketplace Offerings | Pros:
Cons:
|
RStudio products on AWS Marketplace, Azure Marketplace, and Google Cloud Platform. |
Deployment to a VPC on a major cloud provider | Pros:
Cons:
|
Deploy RStudio products in a VPC, using cloud formation templates for AWS and Azure ARM template (See RStudio Cloud Tools)
Deploy RStudio products via Docker e.g. use EKS (Elastic Kubernetes Service) on AWS. (See Docker images for RStudio Professional Products)
Connect to cloud based data storage, such as Redshift or S3.
|
|
Scale to meet variable demand | Clustering approaches, including Kubernetes | Pros:
Cons:
|
In addition to the points above, RStudio Server Pro's Launcher integrates with Kubernetes, an industry-standard clustering solution that allows efficient scaling.
RStudio Connect provides many options to scale and tune performance, including being part of an autoscaling group. These options allow Connect to deliver dashboards, Shiny applications, and other types of content to large numbers of users.
|
Minimize data movement | Data lakes | Pros:
Cons:
|
RStudio Server Pro in Qubole Data Platform, for Azure, AWS and GCP
Connect to cloud based data storage, such as Redshift or S3.
Managed RStudio Server Pro on Spark and Hadoop on Azure and AWS (Cazena)
|
Our product team is also happy to provide advice and guidance along this journey. If you’d like to set up a time to talk with us, you can book a time here. We look forward to being your guide.