I work a dataset with 1 500 00 rows and I used the old method to delete NA : I use replace NA by 0 + loop for. It's too long (very very !!)
I would like to know if I can replace this method by a function from dplyr.
description :
If I have NA in 4 columns I have put 0 to replace NA of each cells
If I have NA in 2 columns (e.g X2,Y2) I have to put 0 in cells and put also to 0 in (X1,Y1)
If (X1,Y1) and (X2,Y2) different of NA I keep the values.
Thanks in advance to your help !
tab.na<-replace(tab,is.na(tab),0)
for (i in 1:nrow(tab.na)){
if(tab.na$X1[i]!=0 | tab.na$Y1[i]!=0 | tab.na$X2[i]!=0 | tab.na$Y2[i]!=0)
{
tab.na$X1[i] <-0
tab.na$Y1[i] <- 0
tab.na$X2[i] <- 0
tab.na$Y2[i] <- 0
}
}
Some nice options in the thread here using a combination of tidyr::replace_na(), scoped dplyr verbs (e.g. mutate_at()), as well as some base R options:
1.I tried mutate_all(funs(replace_na(., 0))) and it's OK (It's going really fast).
2. for mutate_if it's not the same thing
Example :
_ I want to put 0 in X1 and Y1 column when there are 0 in X2 and Y2 column
_ I want to put 0 in X2 and Y2 column when there are 0 in X1 and Y1 column
_ When the colunm X1,Y1, X2 and Y2 has fill, we don't change anything.
Is it possible ?
Could you please turn this into a self-contained reprex (short for reproducible example)? It will help us help you if we can be sure we're all working with/looking at the same stuff.
If you run into problems with access to your clipboard, you can specify an outfile for the reprex, and then copy and paste the contents into the forum.
library(dplyr)
#>
#> Attachement du package : 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
data<-tab %>%
select(X1,Y1,X2,Y2)
#> Error in eval(lhs, parent, parent): objet 'tab' introuvable
tab.na<-data%>%
mutate_all(.,funs(replace_na(., 0))) %>%
mutate_if(is.numeric,funs(replace_na(., 0)))
#> Error in UseMethod("tbl_vars"): pas de méthode pour 'tbl_vars' applicable pour un objet de classe "function"
If your question's been answered (even by you!), would you mind choosing a solution? It helps other people see which questions still need help, or find solutions if they have similar problems. Here’s how to do it: