Hello,
I have the following code for a function that can replace a given dataset (df) with the function (mvf)
foo <- function(df, mvf) {
# Make data frame to clean
df_cln <- df
# Find the values to replace
replacements <- sapply(df, mvf, na.rm = TRUE)
# Loop over the columns, and replace the missing values
for (col in seq_len(ncol(df_cln)) ) {
# Get the replacement
replacement <- replacements[col]
# Get the positions of the missing values in the column
missing_vals <- is.na(df_cln[, col])
# Replace the missing values
df_cln[missing_vals, col] <- replacement
}
# Return the cleaned data
df_cln
}
The problem I am having is finding a function (mvf) that will replace the missing values with the mean of the column that the missing value is found in, so that it can work with my above foo() function.
Is there any help that could be suggested for this please?
Without your df and mvf, I am not quite sure I understood the question. Can you provide them ?
As a solution or maybe just hints if I misunderstood, I provide a dummy example you can reproduce on how to fill NA in a table by a value calculated on each column.
find replacement based on a function applied to each column
use feed this named list to replace_na to literally replace each NA of each column by the corresponding replacement value
library(tidyverse)
dummy <- tibble::tribble(
~ V1, ~ V2, ~ V3,
1, NA, 3,
NA, 2, 17,
5, 8, NA,
)
mean_replace <- purrr::map(dummy, mean, na.rm = TRUE)
# a named list of replacement
str(mean_replace)
#> List of 3
#> $ V1: num 3
#> $ V2: num 5
#> $ V3: num 10
dummy %>%
tidyr::replace_na(mean_replace)
#> # A tibble: 3 x 3
#> V1 V2 V3
#> <dbl> <dbl> <dbl>
#> 1 1.00 5.00 3.00
#> 2 3.00 2.00 17.0
#> 3 5.00 8.00 10.0
Created on 2018-02-05 by the reprex package (v0.1.1.9000).
I've made the edits you said however they're not appearing in the form I edited them in for some reason
and I tried applying that but didn't have any luck?
You have to put the triple backtick (```) above and under.
When you are writing a question in discourse, you have a preview. Check the preview to see the changes before validating the editing.
Morevover, you have a button to do that to.
Select your paragraph of code and click on that button.
The idea is to provide something close to a reprex to help us help you. It is what I did above. It will allow me to copy paste your code more easily.
Have you tried to play with replace_na or not ? Does it do want you want ? If not, what is missing ?
It is the kind of critical information you have precise at the beginning. We can't guess...
Why do no want to use some libraries that exists to make your life easier ? Sometimes you don't want to have no dependencies but when doing some analysis, it does not worth it. Moreover, libraries like dplyr are optimized in performance and in stability to help you.
With this in mind, about your code now. Some advices :
When you are trying to debug something try doing it step by step.
is df[,which(colSums(is.na(df))>0)] working ?
is df[,which(colSums(is.na(df))>0)][is.na(df[,which(colSums(is.na(df))>0)])] ?
...
You will encounter error code that will help you understand.
Here, you are trying to do all NA replacement in all column in one step. One easiest thing is to do it by column:
This snippet will apply a function to each column (2). The function applies on a column, search for NA and replace by the mean of the column.
In your code there is some issue with the dimension, and the way you are trying to replace. I won't go into detail but you have to take care of what is your right hand side (RHS) that you want to assign to your left hand sign (LHS). When I try you code the LHS throws an error.