Issue with randomforest. Error in eval(predvars, data, env)

Hello,
I am trying to run the randomforest on the following data-

head(Z1_D3)
ADCYAP1 PAX6 ISL1 ASCL1 ID3 ZNF555 NEUROD1 NKX2-2 CTNNB1
ctrl1st2_AAACCCAAGAGCCTGA.1 0.4356985 0.3140325 0.3602318 0.05124110 0.11842043 0.3975497 0.04783136 0.2748929 0.2714019
ctrl1st2_AAACCCACACGCTTAA.1 0.4377882 0.3264363 0.4005710 0.03520356 0.07404521 0.4723255 0.03190159 0.2759243 0.3121997
ctrl1st2_AAACCCACACGTAGTT.1 0.4601432 0.2370267 0.4364042 0.01830891 0.11393943 0.3972491 0.03621590 0.2681049 0.2861740
ctrl1st2_AAACGAACATTGAGCT.1 0.4149943 0.3274806 0.4263463 0.04283993 0.06996843 0.4217448 0.03454223 0.2744649 0.3118044
ctrl1st2_AAACGCTCAAATCGGG.1 0.3980319 0.3103125 0.4111845 0.02914746 0.09620920 0.4272904 0.03856660 0.3002395 0.3004722
ctrl1st2_AAACGCTCACCGTGAC.1 0.3853714 0.3078610 0.4220731 0.04086714 0.04721513 0.4140166 0.02985199 0.2555154 0.3029383

str(Z1_D3)
'data.frame': 16383 obs. of 140 variables:
ADCYAP1: num 0.436 0.438 0.46 0.415 0.398 ... PAX6 : num 0.314 0.326 0.237 0.327 0.31 ...
ISL1 : num 0.36 0.401 0.436 0.426 0.411 ... ASCL1 : num 0.0512 0.0352 0.0183 0.0428 0.0291 ...
ID3 : num 0.1184 0.074 0.1139 0.07 0.0962 ... ZNF555 : num 0.398 0.472 0.397 0.422 0.427 ...
NEUROD1: num 0.0478 0.0319 0.0362 0.0345 0.0386 ... NKX2-2 : num 0.275 0.276 0.268 0.274 0.3 ...
CTNNB1 : num 0.271 0.312 0.286 0.312 0.3 ... EGR2 : num 0.0464 0.0662 0.0992 0.0719 0.0426 ...

When I try to run the following code, I get the following error-
fit_rf <- randomForest(D3~., data = Z1_D3)
Error in eval(predvars, data, env) : object 'NKX2-2' not found
D3_tf_top10 <- importance(fit_rf)[order(importance(fit_rf)[, 1], decreasing = T), ][1:10]
Error in importance(fit_rf) : object 'fit_rf' not found

When I checked in my data the NKX2-2 is present but still its giving me an error. What could be the reason? Please help me in this case.

1 Like

Maybe the - in the column name is causing a problem. Can you rename the column to NKX2_2?

Hello @FJCC Thank you for kind reply. Thank you for suggestion. I changed the name of the column to NKX2-2 but still the issue persists.

I don't have any good ideas. To test if the problem is particular to randomForest(), try

fit_rf <- lm(D3 ~ .,  data = Z1_D3)

Also, please show the result of running

"NKX2_2" %in% colnames(Z1_D3)
fit_rf <- randomForest(D3~., data = Z1_D3)

Paste the output between lines with three back ticks, like this:
```
pasted output goes here
```

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.