Wrong result next to compare column with if_else()

Hi Community, Im want to compare two columns but contains NA values. Im use this code but the comparison don run well. How fix better this?

Why the both if_else() show different results?

library(tidyverse)
example <-structure(list(SUBSP = c(NA, NA, NA, "TENUIFLORUS", "TENUIFLORUS", 
                                   NA, NA, NA, NA), `NEW-SUBTAXA` = c("0", "0", "0", "0", "0", "0", 
                                                                      "0", "var. marginata", "var. leucacrantha"), NEW_SUBTAXA = c(NA, 
                                                                                                                                   NA, NA, "TENUIFLORUS", "TENUIFLORUS", NA, NA, "var. marginata", 
                                                                                                                                   "var. leucacrantha")), row.names = c(2284L, 2339L, 3118L, 3460L, 
                                                                                                                                                                        9571L, 9837L, 9940L, 5028L, 8839L), class = "data.frame")

example <- example |> 
  dplyr::mutate(NEW_SUBTAXA_CHECK = dplyr::if_else(identical(SUBSP, NEW_SUBTAXA)| (is.na(SUBSP) & is.na(NEW_SUBTAXA)),'IGUAL','DIFERENTE')) |> 
  dplyr::mutate(NEW_SUBTAXA_CHECK2 = dplyr::if_else(SUBSP==NEW_SUBTAXA,'IGUAL','DIFERENTE'))
example
#>            SUBSP       NEW-SUBTAXA       NEW_SUBTAXA NEW_SUBTAXA_CHECK
#> 2284        <NA>                 0              <NA>             IGUAL
#> 2339        <NA>                 0              <NA>             IGUAL
#> 3118        <NA>                 0              <NA>             IGUAL
#> 3460 TENUIFLORUS                 0       TENUIFLORUS         DIFERENTE
#> 9571 TENUIFLORUS                 0       TENUIFLORUS         DIFERENTE
#> 9837        <NA>                 0              <NA>             IGUAL
#> 9940        <NA>                 0              <NA>             IGUAL
#> 5028        <NA>    var. marginata    var. marginata         DIFERENTE
#> 8839        <NA> var. leucacrantha var. leucacrantha         DIFERENTE
#>      NEW_SUBTAXA_CHECK2
#> 2284               <NA>
#> 2339               <NA>
#> 3118               <NA>
#> 3460              IGUAL
#> 9571              IGUAL
#> 9837               <NA>
#> 9940               <NA>
#> 5028               <NA>
#> 8839               <NA>

I believe identical() is checking for equality in the entire vector and returning a single FALSE, while == is doing an element-by-element comparison and returning a logical vector.

In this case was is the better way for get the correct comparison results?
Because the final out should be:

FINAL_CHECK
# IGUAL
# IGUAL
# IGUAL
# IGUAL
# IGUAL
# IGUAL
# IGUAL
# DIFERENTE
# DIFERENTE

How fix this?

edit : sorry, I got confused by the presence of NEW-SUBTAXA which is different from NEW_SUBTAXA and is apparently not used in the question...

example <- example |> 
  dplyr::mutate(NEW_SUBTAXA_CHECK = dplyr::if_else(identical(SUBSP, NEW_SUBTAXA)| (is.na(SUBSP) & is.na(NEW_SUBTAXA)),'IGUAL','DIFERENTE')) |> 
  dplyr::mutate(NEW_SUBTAXA_CHECK2 = dplyr::if_else(SUBSP==NEW_SUBTAXA,'IGUAL','DIFERENTE')) |>
  dplyr::mutate(NEW_SUBTAXA_CHECK3 = dplyr::if_else(tidyr::replace_na(SUBSP,"@SPECIAL") ==
                                                      tidyr::replace_na(NEW_SUBTAXA,"@SPECIAL"),'IGUAL','DIFERENTE'))
example
1 Like

Does

example <- example |>
  dplyr::mutate(NEW_SUBTAXA_CHECK2 = 
                  dplyr::if_else((SUBSP == NEW_SUBTAXA) | (is.na(SUBSP) & is.na(NEW_SUBTAXA)),
                                 'IGUAL', 'DIFERENTE', 'DIFERENTE'))

do what you want?

PS Do follow along from @nirgrahamuk, it would be better to not have two variable names that are so easily confused. Took me a while to sort them out too.

1 Like

The both responses work excellent, tnks guys. (I don't know which one to put as the solution :upside_down_face:)

Sorry for the confusion with columns, Im put NEW-SUBTAXA because was used in the process.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.