Why is the row name error message popping up?

Altec · March 22, 2022, 7:01pm

I get this above error message when trying to create a variable importance plot for my cubist model in caret package. The importance ranking works fine for random forest with everything else being the same. So I don't know what is wrong with this one.

The code to create the Cubist model is
fit.cubist.32prdtrs.newdta1.minus_edu25<-train(y ~ x1+x2+x3...x20, data =df, method= "cubist", trControl = ctrl, importance =true)

williaml · March 23, 2022, 2:53am

Hi, can you provide a reproducible example?

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

nirgrahamuk · March 23, 2022, 8:45am

I would speculate that somehow your df has row.names, and the row.names have at least one duplicate.
I would investigate and try to correct this.

Altec · March 28, 2022, 11:08pm

Thanks for the suggestion! Could you please elaborate a bit what does row.names have at least one duplicate in the df mean ? I looked up row.names and I see that most results are about reading in data using read.table. If it helps, my data set is created by joining two Excel data files together. What could I do to check what rows are duplicates?

nirgrahamuk · March 29, 2022, 9:35am

like a data.frame has column names, it has rownames, if you make a data.frame and dont finesse the rownames, they will just be the number of the row (row.names(iris)). but you can alter the row.names, and some data.frames you might use (like built in ones that come with R might have them row.names(mtcars))
But row.names should play by certain rules, like being unique/ not having duplicate values.
set the row.names(mydf) <- NULL to clear them to the default number
or use tools in the tibble package to keep the info that was in the row.names but have them just be a column and have the row.names be the row numbering Tools for working with row names — rownames • tibble (tidyverse.org) look for : rownames_to_column

system · April 19, 2022, 9:35am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.