Hello, I need help "de"-imputing my data. I conceptually know what I have to do and have an idea of how this will look/be done but I'm struggling with the implementation of it. I have done some reading on stack overflow and found this answer particularly helpful (r - Replace all NA values with the (minimum value/2) value for each column, in large 6000+ column dataset - Stack Overflow -- though this one wasn't bad either: r - Correct syntax for mutate_if - Stack Overflow). I'm trying to adapt what is in the top answer there to my code here. The data is coming to me as imputed but for merging purposes I need to "unimpute" it.
Thankfully, the imputation process we use isn't complicated and missing values are imputed with the minimum value observed, so it's just a matter of systematically going across the columns of the dataframe, and replacing the minimum value with NA
. Below is some dummy code that will provide a reproducible example of the kind of data I am working with:
# making a few toy data frames to construct an example of one final merged and imputed df.
toy_df1 <- as.data.frame(matrix(data = rnorm(n = 100, mean = 0, sd = 1), nrow = 10, ncol = 10))
toy_df2 <- as.data.frame(matrix(data = rnorm(n = 100, mean = 0, sd = 1), nrow = 10, ncol = 10))
toy_df3 <- as.data.frame(matrix(data = rnorm(n = 100, mean = 0, sd = 1), nrow = 10, ncol = 10))
names(toy_df1) <- c("x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8", "x9", "x10")
names(toy_df2) <- c("x1", "x2", "x3", "x5", "x6", "x7", "x8", "x9", "x10", "x11")
names(toy_df3) <- c("x1", "x3", "x4", "x5", "x7", "x8", "x9", "x10", "x11", "x13")
# merging the toy dataframes together.
toy_data_all <- bind_rows(toy_df1, toy_df2, toy_df3)
# creating an imputation function I'll need.
imputeme <- function(x){
value <- ifelse(is.na(x),
min(x,na.rm=TRUE),x); value
}
# creating the imputed "data all" file/data set.
toy_data_all_imputed <- apply(toy_data_all, 2, imputeme)
Here is where I'm running into issues. On my "real life" version of toy_data_all_imputed
(which is called volNormImputedData
), I am trying to run the following code to un-impute it:
# creating the data I need and piping it
volNormImputedData <- volNormData %>%
mutate_if(is.numeric, ~replace(., min(.), is.na(.)))
However, when I run this, while it doesn't return any errors, it returns 22 warnings that say number of items to replace is not a multiple of replacement length
(and if you substitute volNormImputedData
for toy_data_all_imputed
as well as substitute volNormData
with toy_data_all
you get not only an error message that says,
Error in mutate(): ! Problem while computing x2 = (structure(function (..., .x = ..1, .y = ..2, . = ..1) ... . Caused by error in x[list] <- values: ! NAs are not allowed in subscripted assignments
,
but also the same warning message I got above for the real data I'm using).
To try and get around this, I tried to reverse what my imputation function did by creating
Un_imputeme <- function(x){
value <- ifelse(min(x,na.rm=TRUE),
is.na(x),x); value
}
But when I ran this on both my toy_data_all_imputed
data set as well as my actual, real life data of volNormImputedData
, it didn't work/replace the minimum in each column with NA
, so I am now stuck/blocked and could use some help.
I really want to use the dplyr
library if I can because I'm trying to familiarize myself with it and become proficient in using it, as I believe even my little imputation function I have written could easily be done with some dplyr
commands, but also because that first example from Stack.Overflow is so close to what I need and I can't understand why the minimal change I made to it isn't working for my use case. I greatly appreciate the time taken to read this post and help me. Thank you!
-Radon.