I watched and followed along with many of the outstanding Tidy Tuesday examples taught by @julia to learn Tidymodels. I believe what I'm trying to do is going to be most similar to the following example that I've modified from tidymodels.org (XGBoost Regression) .
'Ridership' from the Chicago dataset was predicted from 2 columns. What if instead I want to predict Ridership and Cubs_Home , and from all columns ? Is this possible?
library(tidymodels)
tidymodels_prefer()
data(Chicago)
n <- nrow(Chicago)
#split into training and testing sets
Chicago_train <- Chicago[1:(n - 7), ]
Chicago_test <- Chicago[(n - 6):n, ]
#model specs
bt_reg_spec <-
boost_tree(trees = 15) %>%
# This model can be used for classification or regression, so set mode
set_mode("regression") %>%
set_engine("xgboost")
bt_reg_spec
#Fit model
set.seed(1)
bt_reg_fit <- bt_reg_spec %>% fit(ridership ~ ., data = Chicago_train)
#############What if you want to predict ridership AND Cubs_Home?######################
#fit(ridership + Cubs_Home ~., data = Chicago_train) #### would it be this?????********
#######################################################################################
#Run prediction on test set
results_df <- data.frame("observed"=Chicago_train$ridership,
"predicted"=predict(bt_reg_fit, Chicago_test))
#What's the correlation?
cor(results_df$observed, results_df$.pred)
[1] 0.8854543