Representative Random Forest Plot

AC3112 · November 24, 2021, 3:20pm

Hi All,

I wanted to generate a representative decision tree plot from a random forest output.

Thus far, I have found a couple of routines: 'reprtee' and one using the 'caret' package.

The reptree routine is as follows:

library(randomForest)
library(reprtree)

model <- randomForest(Species ~ ., data=iris, importance=TRUE, ntree=500, mtry = 2, do.trace=100)

reprtree:::plot.getTree(model)

However, when I use the reptree routine on my own data, the tree is very large. I wondered if anyone knew how to control the depth and complexity of the tree in reptree?

If anyone has alternative methods, that would also be appreciated

nirgrahamuk · November 24, 2021, 4:42pm

This is not the representative tree, its precisely the first tree from the set of trees composing the random forest, in full. You can modify with depth = some integer to get that actual tree to a given max depth, and k to choose a different tree from the forest. But this would also not be a representative tree.
I think you may be intending

reprtree:::plot.reprtree( ReprTree(model, iris, metric='d2'))

I think this latter also supports a depth param, try it out.

AC3112 · November 24, 2021, 4:59pm

Thanks @nirgrahamuk . I appreciate the new code.

It certainly prints out a tree-like structure. However, it is so large as a tree structure, it blurs over the page/looks uninterpretable.

Do you think this is a model-specific problem (my model has around 12 independent variables), or do you think with more tuning, it could be made to look as tidy and interpretable as a decision tree produced by CART-like methods?

nirgrahamuk · November 24, 2021, 5:09pm

Is this after trying the depth parameter?

AC3112 · November 24, 2021, 5:19pm

Yeah, with the depth parameter set = 3.

I specified my model as:

model <- randomForest(Y ~ ., 
data=train, importance=TRUE, ntree=500, mtry = 3, do.trace=100, depth = 3)

The depth parameter didn't seem available in the ReprTree function by comparison.

nirgrahamuk · November 24, 2021, 5:23pm

No, you want the randomForest to be whatever depth it is, but your representative tree to have a human readable friendly depth.
I was telling you about reprtree:::plot.reprtree() having a depth param i.e.

reprtree:::plot.reprtree( ReprTree(model, iris, metric='d2'),depth=3)

AC3112 · November 24, 2021, 5:30pm

Thanks @nirgrahamuk. Much much better. I appreciate your help.

Can I ask if you know of the other metrics available for constructing the tree beyond the distance metric used in the example above?

nirgrahamuk · November 24, 2021, 5:32pm

sadly, the documentation of reprtree makes it clear that only d2 has been implemented.
What metric would you prefer ?

AC3112 · November 24, 2021, 5:35pm

No no. I didn't have a metric in mind, must admit. I've just always associated with that type of metric with unsupervised clustering etc.

However, again, I appreciate your help with the code above (and all the other instances you've helped recently).

Thanks for your time

system · December 1, 2021, 5:36pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.