how do you removed NAs from a column without affecting others?

meitei · December 3, 2022, 5:49pm

xyz=data.frame(name=c("a","b","a","b",
                          "a","b")
                   ,maths=c(7,8,NA,NA,NA,NA)
                   ,science=c(NA,NA,6,8,NA,NA)
                   ,history=c(NA,NA,NA,NA,6,7))

How can I removed NAs from the above dataframe?
Here is the expected output:

xyz=data.frame(name=c("a","b")
               ,maths=c(7,8)
               ,science=c(6,8)
               ,history=c(6,7))

startz · December 3, 2022, 6:00pm

This may sound like a strange question, but are you sure you want to do that? Does the data always come exactly in pairs that get repeated? In general, if you remove the NAs from each column separately you change which row the good data is associated with and you may end up with a different number of cells in each column, which isn't permitted.

meitei · December 3, 2022, 6:04pm

Yes, this is the reprex of my datasets.

scottyd22 · December 3, 2022, 6:07pm

Below is one way to achieve your desired output using pivot_longer and pivot_wider.

library(tidyverse)

xyz=data.frame(name=c("a","b","a","b",
                      "a","b")
               ,maths=c(7,8,NA,NA,NA,NA)
               ,science=c(NA,NA,6,8,NA,NA)
               ,history=c(NA,NA,NA,NA,6,7))

xyz = xyz |>
  pivot_longer(cols = -'name', names_to = 'group', names_repair = 'minimal') |>
  filter(!is.na(value)) |>
  pivot_wider(names_from = group, values_from = value)

xyz
#> # A tibble: 2 × 4
#>   name  maths science history
#>   <chr> <dbl>   <dbl>   <dbl>
#> 1 a         7       6       6
#> 2 b         8       8       7

Created on 2022-12-03 with reprex v2.0.2.9000

system · December 10, 2022, 6:07pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.