Selective filtering: replace with NA observations based on values contained in other variables

ES_StatR · October 9, 2020, 7:44am

Hi everybody,
I'm working on a database who contains taxonomic information about Fungi. For each taxonomic rank (kingdom, division, family, genus, ecc...) I have the relative value of probability associated that indicates the probability that the observed row actually belongs to that certain rank.

I have to filter those observation who have a low value of probability but instead of removing rows, I would like to replace with NA the observation which have a low probability value.

Example:
tibble(tax_rank_one = rep("fungi", 10),
val_one = c(1, 1, 1, 0.4, 0.5, 1, 1, 1, 0.3, 0.9),
tax_rank_two = c("Basidiomycota", "Ascomycota", "Ascomycota", "Basidiomycota", "Basidiomycota", "Ascomycota", "Basidiomycota", "Ascomycota", "Ascomycota", "Ascomycota"),
val_two = c (1, 1, 1, 0.1, 0.1, 1, 0.5, 1, 0.1, 0.1))

My will is to replace with NA the observation in "tax_rank_one" corresponding to a value lower than 0.8 in "val_one" variable. And the same for "tax_rank_two" and "val_two" variables.

Thanks all of you for the help!

technocrat · October 9, 2020, 7:52am

Similar for the second pair:

suppressPackageStartupMessages({
  library(dplyr)
  library(tibble)
  })
dat <- tibble(tax_rank_one = rep("fungi", 10),
       val_one = c(1, 1, 1, 0.4, 0.5, 1, 1, 1, 0.3, 0.9),
       tax_rank_two = c("Basidiomycota", "Ascomycota", "Ascomycota", "Basidiomycota", "Basidiomycota", "Ascomycota", "Basidiomycota", "Ascomycota", "Ascomycota", "Ascomycota"),
       val_two = c (1, 1, 1, 0.1, 0.1, 1, 0.5, 1, 0.1, 0.1))

dat %>% mutate(tax_rank_one = ifelse(val_one < 0.8,NA,tax_rank_one))
#> # A tibble: 10 x 4
#>    tax_rank_one val_one tax_rank_two  val_two
#>    <chr>          <dbl> <chr>           <dbl>
#>  1 fungi            1   Basidiomycota     1  
#>  2 fungi            1   Ascomycota        1  
#>  3 fungi            1   Ascomycota        1  
#>  4 <NA>             0.4 Basidiomycota     0.1
#>  5 <NA>             0.5 Basidiomycota     0.1
#>  6 fungi            1   Ascomycota        1  
#>  7 fungi            1   Basidiomycota     0.5
#>  8 fungi            1   Ascomycota        1  
#>  9 <NA>             0.3 Ascomycota        0.1
#> 10 fungi            0.9 Ascomycota        0.1

^{Created on 2020-10-09 by the reprex package (v0.3.0.9001)}

ES_StatR · October 9, 2020, 8:27am

Thanks @technocrat, it works perfectly!
I will treasure this helpful code

system · October 16, 2020, 8:27am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.