sum of zeros equals NA not 0 in case_when statement

I have a large number of variables that may contain zeros and NA's and I want to convert zeros to NA depending on certain conditions.

One of these conditions is that if all the values in multiple columns equal zero then convert to 999.

I'd like to do this in one function as the dataset is huge. Any idea on why sum(.) returns NA or a workaround?

Sample dataframe

library(tidyr)
library(dplyr)

x<-c(NA,-1,0,0,0)
y<-c(NA,23,NA,0,10)

xy<-as.data.frame(cbind(x=x,y=y))

xy

xy%>%
  mutate_at(c("x","y"),
            ~case_when(sum(.)==0~999,
                       . %in% c(NA,0)~NA_real_,
                       TRUE~.))

Output looks like this
print (xy)

  x  y
1 NA NA
2 -1 23
3 NA NA
4 NA NA
5 NA 10

row 4 should be 999 999

Hi, and welcome!

Could you provide a reproducible example, called a reprex, please? Making it easier to test the code will usually bring out more and better answers.

1 Like

I apologize if the reprex was there all along and I simply glossed over it. I'll take a look tomorrow and comment. :upside_down_face:

From your reprex, I understand your expecting result is that sum(.) == 0 apply on row , like x[4] + y[4] = 0, is that so ?

case_when will apply on column and currently you don't get 999 because sum(.) applies on each column.

If you add a helper vector with the some pre calculated you get your expected result I think

library(dplyr)
#> 
#> Attachement du package : 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

x<-c(NA,-1,0,0,0)
y<-c(NA,23,NA,0,10)

xy<-as.data.frame(cbind(x=x,y=y))

xy
#>    x  y
#> 1 NA NA
#> 2 -1 23
#> 3  0 NA
#> 4  0  0
#> 5  0 10

xy%>%
  # add a helper column
  mutate(xpy = x + y) %>%
  mutate_at(c("x","y"),
            ~case_when(
              xpy == 0 ~ 999,
              . %in% c(NA,0) ~ NA_real_,
              TRUE ~ .)
            ) %>%
  # remove the helper colum
  select(-xpy)
#>     x   y
#> 1  NA  NA
#> 2  -1  23
#> 3  NA  NA
#> 4 999 999
#> 5  NA  10

Created on 2019-11-26 by the reprex package (v0.3.0)

you can also for example deal with your data.frame using pmap to do a rowise operation

library(dplyr, warn.conflicts = FALSE)

x<-c(NA,-1,0,0,0)
y<-c(NA,23,NA,0,10)

xy<-as.data.frame(cbind(x=x,y=y))

xy
#>    x  y
#> 1 NA NA
#> 2 -1 23
#> 3  0 NA
#> 4  0  0
#> 5  0 10

xy %>%
  purrr::pmap_df(function(x, y) {
    xpy = x + y
    recode_custom <- function(v) {
      case_when(
        xpy == 0 ~ 999,
        v %in% c(NA,0) ~ NA_real_,
        TRUE ~ v)
    }
    tibble::tibble(
       x = recode_custom(x),
       y = recode_custom(y)
    )
  })
#> # A tibble: 5 x 2
#>       x     y
#>   <dbl> <dbl>
#> 1    NA    NA
#> 2    -1    23
#> 3    NA    NA
#> 4   999   999
#> 5    NA    10

Created on 2019-11-26 by the reprex package (v0.3.0)

3 Likes

Sometimes conciseness is a foe to clarity. Sort of a corollary to the admonition against premature optimization.

suppressPackageStartupMessages(library(dplyr))

x <-c(NA,-1,0,0,0)
y <-c(NA,23,NA,0,10)

xy <-as.data.frame(cbind(x=x,y=y))
xy <- xy %>% mutate(col_sum = x + y) %>% mutate(x = ifelse(col_sum == 0, 999, x)) %>% mutate(y = ifelse(col_sum == 0, 999, y))
xy
#>     x   y col_sum
#> 1  NA  NA      NA
#> 2  -1  23      22
#> 3  NA  NA      NA
#> 4 999 999       0
#> 5   0  10      10

Created on 2019-11-25 by the reprex package (v0.3.0)
Gives part of the solution. However, there's an NA problem

42 + NA
#> [1] NA

Created on 2019-11-25 by the reprex package (v0.3.0)

Thank you al for those brilliant solutions.
Cderv hit the nail in the head.
And the solution can extend to many columns as well.
Cheers
Norm

2 Likes

For the benefit of those who follow, please mark @cderv's as the solution.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.