Error in missForest command

Hi ,

I am new to R. I am trying to use missForest package to impute missing values, but the command returns the following error message.

iris.imp <- missForest(H16_IC, xtrue = H16_IC, verbose = TRUE)
Error in sample.int(length(x), size, replace, prob) :
invalid first argument

I kindly request your advice to resolve this issue.

Regards,

J

From the documentation for the missForest() function, it looks like the first argument is:

xmis a data matrix with missing values. The columns correspond to the variables and the rows to the observations.

If you're starting from a data frame, you might want to look at the example at the bottom of the function reference (see link above, and pasted below)

## Nonparametric missing value imputation on mixed-type data:
data(iris)
summary(iris)

## The data contains four continuous and one categorical variable.

## Artificially produce missing values using the 'prodNA' function:
set.seed(81)
iris.mis <- prodNA(iris, noNA = 0.2)
summary(iris.mis)

## Impute missing values providing the complete matrix for
## illustration. Use 'verbose' to see what happens between iterations:
iris.imp <- missForest(iris.mis, xtrue = iris, verbose = TRUE)

## The imputation is finished after five iterations having a final
## true NRMSE of 0.143 and a PFC of 0.036. The estimated final NRMSE
## is 0.157 and the PFC is 0.025 (see Details for the reason taking
## iteration 4 instead of iteration 5 as final value).

## The final results can be accessed directly. The estimated error:
iris.imp$OOBerror

## The true imputation error (if available):
iris.imp$error

## And of course the imputed data matrix (do not run this):
## iris.imp$Ximp

If H16_IC is already a data matrix with missing values, then it'll be much easier to help you if you can supply a reprex (short for reproducible example).

install.packages("reprex")

If you've never heard of a reprex before, you might want to start by reading the tidyverse.org help page. The reprex dos and don'ts are also useful.

What to do if you run into clipboard problems

If you run into problems with access to your clipboard, you can specify an outfile for the reprex, and then copy and paste the contents into the forum.

reprex::reprex(input = "fruits_stringdist.R", outfile = "fruits_stringdist.md")

For pointers specific to the community site, check out the reprex FAQ.

1 Like

Dear Mara,

Thank you very much for the guidance. I tried to install the reprex as you suggested and ran the codes. The problem still persists, I got the following message.

iris.imp <- missForest(xmis=H16_IC, xtrue = H16_IC, verbose = TRUE)
#> Error in missForest(xmis = H16_IC, xtrue = H16_IC, verbose = TRUE): could not find function "missForest"

reprex is a package/tool to help you make a reproducible example so that we can help you troubleshoot (see the links in my earlier post to learn more)

Without a reprex, it's hard to say exactly what's going wrong in your code from the initial post, but the error you're getting below suggests that you don't have the missForest package loaded in your session, and, thus, R can't find the missForest() function.

#> Error in missForest(xmis = H16_IC, xtrue = H16_IC, verbose = TRUE): could not find function "missForest"

Hi Mara, thank you for the quick response. I will try and learn about reprex.

Infact, i did install missforest package and used library command before running the imputation code.

As you suggested, I will read and read back to you shortly.

Thanks a lot

J

Dear Mara, Here is my reprex output. Thank you for educating me about reprex.This is very helpful.

#### imputation 
library(missForest)
#> Loading required package: randomForest
#> randomForest 4.6-14
#> Type rfNews() to see new features/changes/bug fixes.
#> Loading required package: foreach
#> Loading required package: itertools
#> Loading required package: iterators
library(randomForest)
mydata<-(H16_IC[,1:46])
#> Error in eval(expr, envir, enclos): object 'H16_IC' not found
im.out <- missForest(xmis = mydata, maxiter = 10, ntree = 100,
                       variablewise = FALSE,
                       decreasing = FALSE, verbose = FALSE,
                       mtry = floor(sqrt(ncol(wi.miss))), replace = TRUE,
                       classwt = NULL, cutoff = NULL, strata = NULL,
                       sampsize = NULL, nodesize = NULL, maxnodes = NULL,
                       xtrue = NA, parallelize = "no")
#> Error in nrow(xmis): object 'mydata' not found

Created on 2018-12-05 by the reprex package (v0.2.1)

library(missForest)
#> Loading required package: randomForest
#> randomForest 4.6-14
#> Type rfNews() to see new features/changes/bug fixes.
#> Loading required package: foreach
#> Loading required package: itertools
#> Loading required package: iterators
summary(data)
#> Error in object[[i]]: object of type 'closure' is not subsettable
data.imp <- missForest(data)
#> Warning in is.na(xmis): is.na() applied to non-(list or vector) of type
#> 'closure'
#> Error in apply(is.na(xmis), 2, sum): dim(X) must have a positive length

Created on 2018-12-05 by the reprex package (v0.2.1)

Unfortunately, it looks like you didn't include your data inside of the reprex (that's part of the self-contained element), which is why you're getting errors like:

#>  Error in nrow(xmis): object 'mydata' not found

I don't know what format your data is in, but if it's spreadhseet-like, you can include a bit of it using the datapasta package:


Some more options from the reprex do's and don'ts article:

Use the smallest, simplest, most built-in data possible.

  • Think: iris or mtcars . Bore me.
  • If you must make some objects, minimize their size and complexity.
  • Many of the functions and packages you already use to import data from delimited files also offer a way to create a small data frame “inline”:
  • read.table() and friends have a text argument. Example: read.csv(text = "a,b\n1,2\n3,4") .
  • tibble::tribble() lets you use a natural and readable layout. Example:
 tibble::tribble(
 ~ a, ~ b,
     1,   2,
    3,   4
 )
 #> # A tibble: 2 x 2
 #>       a     b
 #>   <dbl> <dbl>
 #> 1     1     2
 #> 2     3     4
  • Get just a bit of something with head() or by indexing with the result of sample() . If anything is random, consider using set.seed() to make it repeatable.
  • dput() is a good way to get the code to create an object you have lying around, if you simply cannot make do with built-in or simulated data. Copy and paste the result of this into your reprex.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.