Hi All,
I've been attempting to use the PReMiuM package, yet I am struggling with some basics. Would really appreciate any feedback
Predominantly I am struggling with some of the arguments in the regression, alongside the format of the data required for the regression.
Overall, my data is a standard data frame format with a continuous dependent variable constrained on the unit interval, and my covariates as categorical (some ordinal, and some binary).
If I was to run something like a random forest in R, I would do:
rf <- randomForest(y ~, var1 + var2 +...+ varN, data = data, mtry = 3, n.trees = 500).
In this regard, I present the arguments for y and x to random forest. However, I've noticed that with the PReMiuM regression that the data must be pre-formatted perhaps?
For example,
library("PReMiuM")
inputs <- generateSampleDataFile(clusSummaryVarSelectBernoulliDiscrete())
runInfoObj <- profRegr(yModel = inputs$yModel, xModel = inputs$xModel,
nSweeps = 10000, nBurn = 20000, seed = seed,
data = inputs$inputData, output = "output",
covNames = inputs$covNames, nClusInit = 20,
run = TRUE)
Here, it seems that 'inputs' is already encoded:
$covNames [1] "Variable1" "Variable2" "Variable3" "Variable4" "Variable5" "Variable6" "Variable7" "Variable8" [9] "Variable9" "Variable10"
$xModel[1] "Discrete"
$yModel[1] "Bernoulli"
$nCovariates[1] 10
Is it possible for someone to get input data in such a format where it is subsetted into input$xModel, input$yModel, input$nCovariates, etc?
All feedback would be appreciated.