From the documentation for the missForest() function, it looks like the first argument is:
xmis a data matrix with missing values. The columns correspond to the variables and the rows to the observations.
If you're starting from a data frame, you might want to look at the example at the bottom of the function reference (see link above, and pasted below)
## Nonparametric missing value imputation on mixed-type data:
data(iris)
summary(iris)
## The data contains four continuous and one categorical variable.
## Artificially produce missing values using the 'prodNA' function:
set.seed(81)
iris.mis <- prodNA(iris, noNA = 0.2)
summary(iris.mis)
## Impute missing values providing the complete matrix for
## illustration. Use 'verbose' to see what happens between iterations:
iris.imp <- missForest(iris.mis, xtrue = iris, verbose = TRUE)
## The imputation is finished after five iterations having a final
## true NRMSE of 0.143 and a PFC of 0.036. The estimated final NRMSE
## is 0.157 and the PFC is 0.025 (see Details for the reason taking
## iteration 4 instead of iteration 5 as final value).
## The final results can be accessed directly. The estimated error:
iris.imp$OOBerror
## The true imputation error (if available):
iris.imp$error
## And of course the imputed data matrix (do not run this):
## iris.imp$Ximp
If H16_ICis already a data matrix with missing values, then it'll be much easier to help you if you can supply a reprex (short for reproducible example).
If you run into problems with access to your clipboard, you can specify an outfile for the reprex, and then copy and paste the contents into the forum.
Thank you very much for the guidance. I tried to install the reprex as you suggested and ran the codes. The problem still persists, I got the following message.
iris.imp <- missForest(xmis=H16_IC, xtrue = H16_IC, verbose = TRUE)
#> Error in missForest(xmis = H16_IC, xtrue = H16_IC, verbose = TRUE): could not find function "missForest"
reprex is a package/tool to help you make a reproducible example so that we can help you troubleshoot (see the links in my earlier post to learn more)
Without a reprex, it's hard to say exactly what's going wrong in your code from the initial post, but the error you're getting below suggests that you don't have the missForestpackage loaded in your session, and, thus, R can't find the missForest() function.
#> Error in missForest(xmis = H16_IC, xtrue = H16_IC, verbose = TRUE): could not find function "missForest"
library(missForest)
#> Loading required package: randomForest
#> randomForest 4.6-14
#> Type rfNews() to see new features/changes/bug fixes.
#> Loading required package: foreach
#> Loading required package: itertools
#> Loading required package: iterators
summary(data)
#> Error in object[[i]]: object of type 'closure' is not subsettable
data.imp <- missForest(data)
#> Warning in is.na(xmis): is.na() applied to non-(list or vector) of type
#> 'closure'
#> Error in apply(is.na(xmis), 2, sum): dim(X) must have a positive length
Unfortunately, it looks like you didn't include your data inside of the reprex (that's part of the self-contained element), which is why you're getting errors like:
#> Error in nrow(xmis): object 'mydata' not found
I don't know what format your data is in, but if it's spreadhseet-like, you can include a bit of it using the datapasta package:
Some more options from the reprex do's and don'ts article:
Use the smallest, simplest, most built-in data possible.
Think: iris or mtcars . Bore me.
If you must make some objects, minimize their size and complexity.
Many of the functions and packages you already use to import data from delimited files also offer a way to create a small data frame “inline”:
read.table() and friends have a text argument. Example: read.csv(text = "a,b\n1,2\n3,4") .
tibble::tribble() lets you use a natural and readable layout. Example:
tibble::tribble(
~ a, ~ b,
1, 2,
3, 4
)
#> # A tibble: 2 x 2
#> a b
#> <dbl> <dbl>
#> 1 1 2
#> 2 3 4
Get just a bit of something with head() or by indexing with the result of sample() . If anything is random, consider using set.seed() to make it repeatable.
dput() is a good way to get the code to create an object you have lying around, if you simply cannot make do with built-in or simulated data. Copy and paste the result of this into your reprex.