Why is the row name error message popping up?

I get this above error message when trying to create a variable importance plot for my cubist model in caret package. The importance ranking works fine for random forest with everything else being the same. So I don't know what is wrong with this one.

The code to create the Cubist model is
fit.cubist.32prdtrs.newdta1.minus_edu25<-train(y ~ x1+x2+x3...x20, data =df, method= "cubist", trControl = ctrl, importance =true)

Hi, can you provide a reproducible example?

I would speculate that somehow your df has row.names, and the row.names have at least one duplicate.
I would investigate and try to correct this.

Thanks for the suggestion! Could you please elaborate a bit what does row.names have at least one duplicate in the df mean ? I looked up row.names and I see that most results are about reading in data using read.table. If it helps, my data set is created by joining two Excel data files together. What could I do to check what rows are duplicates?

like a data.frame has column names, it has rownames, if you make a data.frame and dont finesse the rownames, they will just be the number of the row (row.names(iris)). but you can alter the row.names, and some data.frames you might use (like built in ones that come with R might have them row.names(mtcars))
But row.names should play by certain rules, like being unique/ not having duplicate values.
set the row.names(mydf) <- NULL to clear them to the default number
or use tools in the tibble package to keep the info that was in the row.names but have them just be a column and have the row.names be the row numbering Tools for working with row names — rownames • tibble (tidyverse.org) look for : rownames_to_column

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.