The tabular foundation model TabPFN was reported earlier this year in Nature. The tool seems to have a lot of excitement about it in the ML community due to its performance.
Is there any plan to incorporate TabPFN into the Tidymodels ecosystem? This would be easier to incorporate into existing codebases (compared to running TabPFN in Python separately or via reticulate).
Yes, before the end of the year. I made a package (feedback welcome) that uses reticulate. I did in conjunction with Prior labs and the plan was to transfer it to them but they have not responded.
The plain would be to:
determine what model to use (mlp() or its own model type)
send to cran.
There are a few small issues (one related to openMP) and another with their extension for ensembles which was not working for us (a few months ago).
The minimal reprex on that page worked for me, but in practice I never really could get TabPFN to work in R using reticulate in this way.
The primary issue is that I'm fitting lots of models, which causes R to abort due to the CPU strain. I remembered that the TabPFN GitHub recommends using GPUs. Configuring that was a pain, and in the end there were too many hang-ups for me to try and debug.
I will anxiously await the tidymodels integration. I look forward to something like set_engine("TabPFN") to make life a lot easier! Any considerations given to GPU acceleration will be much appreciated as well.