Future of caret?

eric_bickel · September 15, 2017, 3:37pm

Hey - looking at github.com/rstudio and seeing a ton of interesting work around keras and tensorflow / tfestimators and am wondering if there is some potential to see a tidy-esque approach to a GPU modeling framework involving recipes and some version of caret that offloads some of the lower-level programming required for unique modeling techniques that may not be predefined within keras or tensorflow already.

That would be solid

mara · September 15, 2017, 3:48pm

I don't know much about GPU modeling , but Max Kuhn's bookdown guide, The caret Package, was just updated on September 4th (as of my typing this), so it could be a good place to take a look.

RobertMyles · September 15, 2017, 5:44pm

The fantastic greta package also does a good job of keeping things R-esque.

mara · September 15, 2017, 5:49pm

Yes! And its author, Nick Golding, has done a great job with the documentation, too:

Here's the GH

for preview purposes:

eric_bickel · September 15, 2017, 5:52pm

I'll have to check greta out. At first, I had it confused with gretl and I had flashbacks to darker times haha.

eric_bickel · September 15, 2017, 5:53pm

Thank you! I'm also secretly hoping that Max Kuhn sees this post and opens up some office hours himself

RobertMyles · September 15, 2017, 8:12pm

He's also super-helpful.

Max · September 16, 2017, 7:31am

if there is some potential to see a tidy-esque approach to a GPU modeling framework involving recipes and some version of caret that offloads some of the lower-level programming required for unique modeling techniques that may not be predefined within keras or tensorflow already

That hadn't crossed my mind. There is/will be connections between caret/recipes and tensorflow packages. The last release of caret contains two neural net models build on keras and I'm playing with adding autoencoders to recipes.

One issue is that "not be predefined within keras or tensorflow already" means a lot of close-to-the-metal work in tensorflow and right now I would avoid that since 1) I don't know the intricacies of that system and 2) the api might change a lot.

Also, it's my belief that the gpu is optimized and fast for gpu-things (like matrix calculations) and doesn't help that much otherwise. For example, I don't know what that would offer for something like trees etc.

I think that a more likely integration would be to have some recipes steps off-loaded to tensorflow. The autoencoder is a good prototype for that. I'd like to have complete backends for recipes (as in dplyr) so that you can use remote data in another system and use recipes to tell the system, what to do.

Max · September 16, 2017, 7:31am

Yes, I will get on that and schedule some.

eric_bickel · September 16, 2017, 5:28pm

This is super exciting stuff! I am a huuuge caret fanboy and have been translating a lot of our pre-processing work over to recipes as well, so anything that promotes that framework is fantastic in my book.

I hadn't thought of the algo's that are outside the optimal scope of GPU processing. Also, from a user perspective, we tend to offload those types of calculations to parallelization across CPU cores. Not sure if that is useful info or not, but from my perspective a combo of CPU and GPU processing is super strong.