I havent used tidymodels myself, but it would make sense to me to prepare the variables on df_train and df_test in the same way. step_mutate(Survived=factor(as.logical(Survived))) is done for df_train only, perhaps do it earlier in your flow so that the mutation happens before the data is split ?
No, as I understand it, when you fit the workflow with a recipe and a model, calling predict with the workflow and a new dataset should apply any transformations specified in the recipe to the new data. At least, that's what the next section in the tutorial suggests.
sorry, I looked at this a little but I have no idea how to do this the tidymodels way , its only version 0.1 so maybe its not fully featured yet.
From the tutorial you linked, the arr_delay outcome/variable is defined before the creation of the recipe, indeed before even the split. so perhaps outcome/target variables are not suitable for step_mutates etc. ?
Thats great. I suppose the downside is that if someone provided you a new data by csv, you'd have to manually convert Survived rather than rely on the recipe to do it ?
Perhaps its worth raising this as an issue on the recipe github
See the package vignette on skipping steps. In general, when baking new data (i.e. executing the finished recipe) you can't ensure that the outcome will be available. Skipping steps that involve the out is the way to get around this.
That's a good point. I just tried it, skipping mutate when loading the data and putting step_mutate back in the recipe with skip=TRUE, and I can fit the model without any problems. But of course it makes more sense to do the mutate when loading, rather than having to mutate twice.