Replace NA's with values from other columns

Normal_distribution · May 30, 2022, 9:18pm

I have a column with several NA's that I want to be replaced by values from other columns of the same data frame. Here's an example

test <- tibble(x = c(1,2,NA,NA,5,NA,3),
               y1 = c(NA,NA,3,NA,NA,NA,NA),
               y2 = c(NA,NA,NA,15,NA,1,NA)
               )

Ideally I want x to be (1,2,3,15,5,1,3). The easiest way I can find that does this is to use the coalesce() function, so essentially I would do test %>% mutate(x = coalesce(x,y1,y2)). That said, for my specific data set I have the following problems:

There are more than two columns to take values from, i.e. instead of having just y1 and y2 in the above example, I have a lot more columns.
The names of the columns to take values from have very weird names with little patterns.

In other words, instead of having to write y1,y2 in the coalesce() function above, I would like to somehow call them without specifying them. Any idea how this can be achieved? Many thanks!

Dobrokhotov1989 · May 31, 2022, 4:33am

If you want to use all columns in coalesce() then you can use coalesce(!!!.). But in this case make sure that x is your first column or there is no more than one non-NA value in each row

library(tidyverse)

# Your original example
test <- tibble(x = c(1,2,NA,NA,5,NA,3),
               y1 = c(NA,NA,3,NA,NA,NA,NA),
               y2 = c(NA,NA,NA,15,NA,1,NA)
)

(x <- test %>%
  mutate(x = coalesce(!!!.)))
#> # A tibble: 7 x 3
#>       x    y1    y2
#>   <dbl> <dbl> <dbl>
#> 1     1    NA    NA
#> 2     2    NA    NA
#> 3     3     3    NA
#> 4    15    NA    15
#> 5     5    NA    NA
#> 6     1    NA     1
#> 7     3    NA    NA

# Why x should be first column
# (added extra non-NA to y1 and make x second col)
test2 <- tibble(y1 = c(3,NA,3,NA,NA,NA,NA),
                x = c(1,2,NA,NA,5,NA,3),
                y2 = c(NA,NA,NA,15,NA,1,NA)
)

(x2 <- test2 %>%
  mutate(x = coalesce(!!!.)))
#> # A tibble: 7 x 3
#>      y1     x    y2
#>   <dbl> <dbl> <dbl>
#> 1     3     3    NA
#> 2    NA     2    NA
#> 3     3     3    NA
#> 4    NA    15    15
#> 5    NA     5    NA
#> 6    NA     1     1
#> 7    NA     3    NA

# While x was identical in original tibbles
identical(test$x, test2$x)
#> [1] TRUE

# The first element in the second case was taken from y1
x$x == x2$x
#> [1] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE

If you don't want to use all columns in coalesce() then you have to find a way to define which column to use. You might, for instance, exclude some columns (if it's easier) with select(-colname) or select a range of consecutive columns with either indexes select(1:2) or names select(x:y1). Please check the 'selection features' in ?select to get more ideas on how to dynamically define columns of interest.

system · June 21, 2022, 4:33am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.