A couple of minor things:
When you want to include code in your response, try wrapping it in backticks (`) or using the preformatted text option in the message response (it's the </> button) - it will make your code easier to read. You can insert a whole block of code using three backticks (```), followed by the code, then another three backticks, which gives you a code block like this:
some_answer <- some_function()
But on to your question:
I've seen you already have a post about this on the forum (Functions and Missing values) which you have marked as 'Solved' - can you use the answer in there to help you?
It sounds like you need to do a few things in this foo()
function:
- Identify which values are missing
- Replace them with something
You can try replacing everything with a constant, or (as it seems like you're trying to do here), the average of the column the missing value occurs in.
We've already seen how to to get the average of each column, excluding missing values, so we can incorporate that in to your function, too.
foo <- function(df, mvf) {
# Make data frame to clean
df_cln <- df
# Find the values to replace
replacements <- sapply(df, mvf, na.rm = TRUE)
# Loop over the columns, and replace the missing values
for (col in seq_len(ncol(df_cln)) ) {
# Get the replacement
replacement <- replacements[col]
# Get the positions of the missing values in the column
missing_vals <- is.na(df_cln[, col])
# Replace the missing values
df_cln[missing_vals, col] <- replacement
}
# Return the cleaned data
df_cln
}
The tricky thing here, is that it's not possible to calculate mean()
for non-numeric data, so you may need to think of a different replacement value/missing value function to handle text data (e.g. species name in iris
).
Note also that you can use the replace_na()
function from the tidyr
package to do a lot of this (rather than the base-R code I've put above, but hopefully this code will get you going.