Cross Validation Plot in R

AC3112 · February 13, 2022, 1:29pm

I am relatively new to some machine learning techniques such as cross-validation alongside being quite new to R programming.

However, I am interested in replicating an out-of-sample technique used by Hothorn & Zeileis (2020).

https://www.tandfonline.com/doi/full/10.1080/10618600.2021.1872581

In particular, Supplementary Manual: Section 3.3 (Figure 15) is said to be generated using an: "out-of-sample (50 times 4:1 subsampling) approach.'

I wondered if someone could post some code on how to do this type of '50 times 4:1' type cross-validation on a toy data set?

The estimation methods and performance metrics are irrelevant. Would just love to see some type of process for replicating the graph and the '50 times in 4:1' cross-validation.

Would be genuinely appreciated.

FJCC · February 14, 2022, 5:38am

This does not include cross validation but I think shows the basics of making such a plot. The key is to have a column that identifies each point so the lines can connect the points between model types.

library(ggplot2)
set.seed(123)
Orig <- rnorm(100,mean = 8000,sd = 100)
DF <- data.frame(ID = rep(1:100,4),
                 Model = rep(c("A","B","C","D"), each = 100),
                 Value = c(Orig, Orig-500, Orig-700, Orig-600))
ggplot(DF,aes(x = Model, y = Value)) +
  geom_line(aes(group = ID),color="grey80",alpha=0.3) +
  geom_boxplot() + theme_classic()

^{Created on 2022-02-13 by the reprex package (v2.0.1)}

AC3112 · February 14, 2022, 8:25am

Thank you @FJCC . This gives me a good idea of the kinda information included in the plotting. Thank you

system · March 7, 2022, 8:26am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.