How to conditionally remove missing values (long-format)

Hi, I have a long-format dataset with 3 assessment points. Something like this:

dataset = data.frame("id"=c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5),
"assessment"=c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3),
"scoreA"=c(7,9,5,NA,5,11,2,3,9,1,NA,NA,7,NA,5),
"scoreB"=c(1,2,7,6,1,11,3,3,2,1,12,NA,NA,4,5))
dataset

I would like to remove all observations belonging to the same ID, if ther is any NA at assessment 1. For instance the ID's 2 and 5 have NA cases at assessment 1, so they should be excluded.

I'm trying to sort this out using dplyr and tidyr functions "group_by" "filter" , etc

Thanks!!

Many thanks for the help. Sorry I don't think I explained well what I am looking for.

I would like to remove the entire ID cases only when there is a NA for assessment = 1

In this dataset, the ID's 2 and 5 do have NA's at assessment=1 for scoreA or scoreB, so they should be removed. The other ID's can stay.

Thanks

Try this.

dataset  <-  data.frame("id"=c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5),
                        "assessment"=c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3),
                        "scoreA"=c(7,9,5,NA,5,11,2,3,9,1,NA,NA,7,NA,5),
                        "scoreB"=c(1,2,7,6,1,11,3,3,2,1,12,NA,NA,4,5))
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

to_remove <- dataset |> filter(assessment == 1, if_any(everything(), is.na)) |> select(id) |> pull(id)

dataset <- dataset |> filter(!id %in% to_remove)

Created on 2025-03-18 with reprex v2.1.1

1 Like

Brilliant! Many thanks for the help! Solved!