regneck
November 28, 2021, 3:43pm
1
Hi all!
I have a very similar dataset to one presented below:
M1 <- matrix(rnorm(100),nrow=100)
M2 <- rep(c("2011", "2012", "2013", "2014"), times = 25)
M1 <- as.data.frame(sapply(M1, as.numeric))
M4 <- cbind(M1, M2)
colnames(M4) <- c("value", "year")
M4[2:5,1] <- 0
Now, what I would like to do is to eliminate "0" in the table and replace them with a mean value grouped by a given year.
I try to use the following code:
M4 <- M4 %>% group_by(year) %>% mutate(nc = if_else(M4[,1] == 0, mean(M4[,1]), M4[,1]))
Unfortunately, I get the following error and I do not know how to solve it using similar solutions. I want to avoid splitting the data frame by year, calculating and then binding it again.
Error: Problem with mutate()
column nk
.
nk = if_else(M4[, 1] == 0, mean(M4[, 1]), M4[, 1])
.
nk
must be size 50 or 1, not 200.
The error occurred in group 1: year = "2011".
I think a few things are going on and this might work a bit better. I also am guessing you want the group mean without the 0s so I've chosen to not include those in the means.
library(tidyverse)
M1 <- matrix(rnorm(100),nrow=100)
M2 <- rep(c("2011", "2012", "2013", "2014"), times = 25)
M1 <- as.data.frame(sapply(M1, as.numeric))
M4 <- cbind(M1, M2)
colnames(M4) <- c("value", "year")
M4[2:5,1] <- 0
M4_imp <- M4 %>%
group_by(year) %>%
mutate(
value_temp=if_else(value==0, NA_real_, value),
group_mean=mean(value_temp, na.rm=TRUE),
nc = if_else(is.na(value_temp), group_mean, value)) %>%
ungroup()
M4_imp
#> # A tibble: 100 x 5
#> value year value_temp group_mean nc
#> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 0.0448 2011 0.0448 -0.000777 0.0448
#> 2 0 2012 NA -0.0562 -0.0562
#> 3 0 2013 NA -0.135 -0.135
#> 4 0 2014 NA -0.227 -0.227
#> 5 0 2011 NA -0.000777 -0.000777
#> 6 0.782 2012 0.782 -0.0562 0.782
#> 7 -0.688 2013 -0.688 -0.135 -0.688
#> 8 0.382 2014 0.382 -0.227 0.382
#> 9 -0.117 2011 -0.117 -0.000777 -0.117
#> 10 1.54 2012 1.54 -0.0562 1.54
#> # ... with 90 more rows
Created on 2021-11-28 by the reprex package (v2.0.1)
regneck
November 29, 2021, 9:57am
3
Thank you very much, StatSteph! That helped. Based on your solution, it looks like it can be even a bit shortened:
M4 <- M4 %>% group_by(year) %>% mutate(nc = if_else(is.na(value), mean(value, na.rm = TRUE), value))
Almost. With your example, there are no NA values, only 0 values so you probably want something like this:
M4 <- M4 %>% group_by(year) %>%
mutate(
value=if_else(value==0, NA_real_, value), #change 0s to NA
nc = if_else(is.na(value), mean(value, na.rm = TRUE), value))
system
Closed
December 6, 2021, 2:35pm
5
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.