Removing items with missing values at T1 OR T2

gxm204 · March 2, 2019, 10:40pm

Hi, I have two data frames with the same ID keys at T1 and T2. What is the best way to trim down both data frames such that all ID's are removed that are NA in either data frame?

This is what I came up with but it seems kind of clunky.

library(dplyr)

# two data frames at t1 and t2
t1 <- data.frame(id = c(1,2,3,4,5,6,7),
                 y = c(100,NA,300,400,NA,600,700))

t2 <- data.frame(id = c(1,2,3,4,5,6,7),
                 y = c(NA,200,300,NA,500,600,700))

# remove ID's with NA's at t1 OR t2
tcombined <- cbind(t1,t2) %>% na.omit()

# go back to respective data frames
t1 <- tcombined[,1:2]
t2 <- tcombined[,3:4]

Appreciate any thoughts.

limacina · March 3, 2019, 2:48am

Here's another way to do it - one that may scale up better. Note: this is only checking for NA's in the y column for both t1 and t2 dataframes.

# create vector of which rows to keep. 
# to keep it, is.na has to be false for both. 
index = !is.na(t1$y) & !is.na(t2$y)

# use logical indexing to only select the rows of t1 and t2 where "index" is true: 
t1 = t1[index,]
t2 = t2[index,]

gxm204 · March 3, 2019, 4:56pm

I like it! Thanks so much.

andresrcs · March 3, 2019, 4:56pm

If your question's been answered (even by you!), would you mind choosing a solution? It helps other people see which questions still need help, or find solutions if they have similar problems. Here’s how to do it:

gxm204 · March 3, 2019, 5:09pm

10-4, marked as solution. Thanks

system · March 10, 2019, 5:09pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.