This is a companion discussion topic for the original entry at https://www.rstudio.com/blog/dynamic-r-and-python-models-in-tableau-using-plumbertableau
RStudio believes that you can attain greater business intelligence with interoperable tools that take the full advantage of open-source data science. Your organization may rely on Tableau for reporting purposes, but how can you ensure that you’re using the full power of your data science team’s R and Python models in your dashboards?
With the plumbertableau package (and its corresponding Python package, fastapitableau), you can use functions or models created in R or Python from Tableau through an API. These packages allow you to showcase cutting-edge data science results in your organization’s preferred dashboard tool.
While this post mentions R, anything possible with R and plumbertableau is also doable with Python and fastapitableau.
Foster Data Analytics Capabilities With plumbertableau
With plumbertableau, you can fully develop your model with code-first data science. The package uses plumber to create an API directly from your code. Since your model is fully developed in your data science editor, it can use all the packages and complex calculations it needs.
You can extract the best data science results using R’s capabilities as your model will not be constrained by Tableau’s environment.
Improve Data Quality With APIs for Continuous Use
Seamless integration between analytic platforms prevents issues like using outdated, inaccurate, or incomplete data. Rather than depending on a manual process, data scientists can depend on their data pipelines to ensure data integrity.
With plumbertableau, your tools are integrated through an API. The Tableau dashboard displays results without any intermediate manipulation like copy-and-pasting code or uploading datasets. You can work in confidence knowing your results are synchronized, accurate, and reproducible.
Increase Deliverability by Streamlining Data Pipelines
If your model has many dependencies or versioning requirements, it can be difficult to handle them outside of the development environment. Debugging is even more time-consuming when you need to work in separate environments to figure out what went wrong.
With RStudio Connect, you can publish directly plumbertableau extensions directly from the RStudio IDE. RStudio Connect automatically manages your API’s dependent packages and files to recreate an environment closely mimicking your local development environment. And since all your R code remains in R, you can use your usual data science techniques to efficiently resolve issues.
Read more on the Hosting page of the plumber package.
How to Use plumbertableau: XGBoost with Dynamic Model Output Example
In this walkthrough, we will be using data from the Seattle Open Data Portal to predict the paid parking occupancy percentage in various areas around the city. We will run an XGBoost model in RStudio, create a plumbertableau extension to embed into Tableau, and visualize and interact with the model in a Tableau dashboard. The code is here for reproducibility purposes; however, it will require an RStudio Connect account to complete.
The plumbertableau and fastapi packages have wonderful documentation. Be sure to read them for more information on:
- The anatomy of the extensions
- Details on setting up RStudio Connect and Tableau
- Other examples to try out in your Tableau dashboards
1. Build the model
First, we need to build a model. This walkthrough won’t be covering how to create, tune, or validate a model. If you’d like to learn more on models and machine learning, check out the tidymodels website and Julia Silge’s fantastic screencasts and tutorials.
Load Libraries
library(tidyverse)
library(RSocrata)
library(lubridate)
library(usemodels)
library(tidymodels)
Download and Clean Data
The Seattle Open Data Portal uses Socrata, a data management tool, for its APIs. We can use the RSocrata package to download the data.
parking_data <- RSocrata::read.socrata( "https://data.seattle.gov/resource/rke9-rsvs.json?$where=sourceelementkey <= 1020" )
parking_id <-
parking_data %>%
group_by(blockfacename, location.coordinates) %>%
mutate(id = cur_group_id()) %>%
ungroup()
parking_clean <-
parking_id %>%
mutate(across(c(parkingspacecount, paidoccupancy), as.numeric),
occupancy_pct = paidoccupancy / parkingspacecount) %>%
group_by(
id = id,
hour = as.numeric(hour(occupancydatetime)),
month = as.numeric(month(occupancydatetime)),
dow = as.numeric(wday(occupancydatetime)),
date = date(occupancydatetime)
) %>%
summarize(occupancy_pct = mean(occupancy_pct, na.rm = TRUE)) %>%
drop_na() %>%
ungroup()
We will also need information on the city blocks, so let’s create that dataset.
parking_information <-
parking_id %>%
mutate(loc = location.coordinates) %>%
select(id, blockfacename, loc) %>%
distinct(id, blockfacename, loc) %>%
unnest_wider(loc, c('loc1', 'loc2'))
Create Training Data
Now, let’s create the training set from our original data.
parking_split <-
parking_clean %>%
arrange(date) %>%
select(-date) %>%
initial_time_split(prop = 0.75)
Train and Tune the Model
Here, we train and tune the model. We select the model with the best RSME to use in our dashboard.
xgboost_recipe <- recipe(formula = occupancy_pct ~ ., data = parking_clean) %>% step_zv(all_predictors()) %>% prep()
xgboost_folds <-
recipes::bake(xgboost_recipe,
new_data = training(parking_split)) %>%
rsample::vfold_cv(v = 5)xgboost_model <-
boost_tree(
mode = "regression",
trees = 1000,
min_n = tune(),
tree_depth = tune(),
learn_rate = tune(),
loss_reduction = tune()
) %>%
set_engine("xgboost", objective = "reg:squarederror")xgboost_params <-
parameters(min_n(),
tree_depth(),
learn_rate(),
loss_reduction())xgboost_grid <-
grid_max_entropy(xgboost_params,
size = 5)xgboost_wf <-
workflows::workflow() %>%
add_model(xgboost_model) %>%
add_formula(occupancy_pct ~ .)xgboost_tuned <- tune::tune_grid(
object = xgboost_wf,
resamples = xgboost_folds,
grid = xgboost_grid,
metrics = yardstick::metric_set(rmse, rsq, mae),
control = tune::control_grid(verbose = TRUE)
)xgboost_best <-
xgboost_tuned %>%
tune::select_best("rmse")
xgboost_final <-
xgboost_model %>%
finalize_model(xgboost_best)
We bundle the recipe and fitted model in an object so we can use it later.
train_processed <-
bake(xgboost_recipe, new_data = training(parking_split))
prediction_fit <-
xgboost_final %>%
fit(formula = occupancy_pct ~ .,
data = train_processed)
model_details <- list(model = xgboost_final,
recipe = xgboost_recipe,
prediction_fit = prediction_fit)
Save Objects for the plumbertableau Extension
We’ll want to save our data and our model so that we can use them in the extension. If you have an RStudio Connect account, the pins package is a great choice for saving these objects.
rsc <- pins::board_rsconnect(server = Sys.getenv("CONNECT_SERVER"), key = Sys.getenv("CONNECT_API_KEY"))
pins::pin_write(
board = rsc,
x = model_details,
name = "seattle_parking_model",
description = "Seattle Occupancy Percentage XGBoost Model",
type = "rds"
)
pins::pin_write(
board = rsc,
x = parking_information,
name = "seattle_parking_info",
description = "Seattle Parking Information",
type = "rds"
)
2. Create a plumbertableau Extension
Next, we will use our model to create a plumbertableau extension. As noted previously, the plumbertableau extension is a Plumber API with some special annotations.
Create an R script called plumber.R
. At the top, we list the libraries we’ll need.
library(plumber)
library(pins)
library(tibble)
library(xgboost)
library(lubridate)
library(dplyr)
library(tidyr)
library(tidymodels)
library(plumbertableau)
We want to bring in our model details and our data. If you pinned your data, you’ll change the name of the pin below.
rsc <- pins::board_rsconnect( server = Sys.getenv("CONNECT_SERVER"), key = Sys.getenv("CONNECT_API_KEY") )
xgboost_model <-
pins::pin_read("isabella.velasquez/seattle_parking_model", board = rsc)
Now, we add our annotations. Note that we use plumbertableau annotations, which are slightly different than the ones from plumber.
- We use
tableauArg
rather thanparams
. - We specify what is returned to Tableau with
tableauReturn
. - We must use
post
for what is being returned.
#* @apiTitle Seattle Parking Occupancy Percentage Prediction API #* @apiDescription Return the predicted occupancy percentage at various Seattle locations
#* @tableauArg block_id:integer numeric block ID
#* @tableauArg ndays:integer number of days in the future for the prediction
#* @tableauReturn [numeric] Predicted occupancy rate
#* @post /pred
Now, we create our function with the arguments station_id
and ndays
. These will have corresponding arguments in Tableau. The function will output our predicted occupancy percentage, which will be what we visualize and interact with in the dashboard.
This function takes the city block and number of days in the future to give us the predicted occupancy percentage at that time.
function(block_id, ndays) { times <- Sys.time() + lubridate::ddays(ndays)
current_time <-
tibble::tibble(times = times,
id = block_id)current_prediction <-
current_time %>%
transmute(
id = id,
hour = hour(times),
month = month(times),
dow = wday(times),
occupancy_pct = NA
) %>%
bake(xgboost_model$recipe, .)parking_prediction <-
xgboost_model$prediction_fit %>%
predict(new_data = current_prediction)predictions <-
parking_prediction$.predpredictions[[1]]
}
Finally, we finish off our script with the extension footer needed for plumbertableau extensions.
#* @plumber
tableau_extension
Here is the full plumber.R
script:
library(plumber)
library(pins)
library(tibble)
library(xgboost)
library(lubridate)
library(dplyr)
library(tidyr)
library(tidymodels)
library(plumbertableau)
rsc <-
pins::board_rsconnect(server = Sys.getenv("CONNECT_SERVER"),
key = Sys.getenv("CONNECT_API_KEY"))
xgboost_model <-
pins::pin_read("isabella.velasquez/seattle_parking_model", board = rsc)
#* @apiTitle Seattle Parking Occupancy Percentage Prediction API
#* @apiDescription Return the predicted occupancy percentage at various Seattle locations
#* @tableauArg block_id:integer numeric block ID
#* @tableauArg ndays:integer number of days in the future for the prediction
#* @tableauReturn [numeric] Predicted occupancy rate
#* @post /pred
function(block_id, ndays) {
times <- Sys.time() + lubridate::ddays(ndays)
current_time <-
tibble::tibble(times = times,
id = block_id)
current_prediction <-
current_time %>%
transmute(
id = id,
hour = hour(times),
month = month(times),
dow = wday(times),
occupancy_pct = NA
) %>%
bake(xgboost_model$recipe, .)
parking_prediction <-
xgboost_model$prediction_fit %>%
predict(new_data = current_prediction)
predictions <-
parking_prediction$.pred
predictions[[1]]
}
#* @plumber
tableau_extension
3. Host your API
We have to host our API so that it can be accessed in Tableau. In our case, we publish it to RStudio Connect.
Once hosted, plumbertableau automatically generates a documentation page. Notice that the SCRIPT_*
value is not R code. This is a Tableau command that we will use to connect our extension and Tableau.
4. Create a calculated field in Tableau
There are a few steps you need to take so that Tableau can use your plumbertableau extension. If you are using RStudio Connect, read the documentation on how to configure RStudio Connect as an analytic extension.
Create a new workbook and upload the station_information
file. Under Analysis, turn off Aggregate Measures. Drop Lat
into Rows and Lon
into Columns, which will create a map. Save the workbook.
Make sure your workbook knows to connect to RStudio Connect by going to Analysis > Manage Analytic Extensions Connection > Choose a Connection. Then, select your Connect account.
Drag Id
into the “Detail” mark. Create a parameter called “Days in the Future”. We’re using our model to predict parking occupancy percentage for that date. Show the parameter on the worksheet.
Create a calculated field using the SCRIPT
from the plumbertableau documentation page:
SCRIPT_REAL("/plumbertableau-xgboost-example/pred", block_id, ndays)
For each tableauArg
we have listed in the extension, we will replace it with its corresponding Tableau value. If you’re following along, this means block_id
will become ATTR([Id])
and ndays
will become ATTR([Days in the Future])
.
SCRIPT_REAL("/plumbertableau-xgboost-example/pred", ATTR([Id]), ATTR([Days in the Future]))
5. Run model and visualize results in Tableau
That’s it! Once you embed your extension in Tableau’s calculated fields, you can use your model’s results in your Tableau dashboard like any other measure or dimension.
We can change the ndays
argument to get new predictions from our XGBoost model and display them on our Tableau dashboard.
You can style your Tableau dashboard and then provide your users something that is not only aesthetically pleasing, but is dynamically calculating predictions based on a model you have created in R.
Conclusion
With plumbertableau, you can showcase sophisticated model results that are easy to integrate, debug, and reproduce. Your work will be at the forefront of data science while being visualized in Tableau’s easy, point-and-click interface.
Learn More
Watch James Blair showcase plumbertableau in Leveraging R & Python in Tableau with RStudio Connect:
More on how RStudio supports interoperability across tools can be found on our BI and Data Science Overview Page.