Hi, I have two data frames with the same ID keys at T1 and T2. What is the best way to trim down both data frames such that all ID's are removed that are NA in either data frame?
This is what I came up with but it seems kind of clunky.
library(dplyr)
# two data frames at t1 and t2
t1 <- data.frame(id = c(1,2,3,4,5,6,7),
y = c(100,NA,300,400,NA,600,700))
t2 <- data.frame(id = c(1,2,3,4,5,6,7),
y = c(NA,200,300,NA,500,600,700))
# remove ID's with NA's at t1 OR t2
tcombined <- cbind(t1,t2) %>% na.omit()
# go back to respective data frames
t1 <- tcombined[,1:2]
t2 <- tcombined[,3:4]
Here's another way to do it - one that may scale up better. Note: this is only checking for NA's in the y column for both t1 and t2 dataframes.
# create vector of which rows to keep.
# to keep it, is.na has to be false for both.
index = !is.na(t1$y) & !is.na(t2$y)
# use logical indexing to only select the rows of t1 and t2 where "index" is true:
t1 = t1[index,]
t2 = t2[index,]
If your question's been answered (even by you!), would you mind choosing a solution? It helps other people see which questions still need help, or find solutions if they have similar problems. Here’s how to do it: