How to Filter data by looking another table

Hello R community, I really need your help to solve this problem.

I want to filter Patient_ID and Date_Supplied in Main_table by looking up in Lookup_table with following conditions:

  1. Patient_ID with its corresponding Date_Supplied in Main_table should match in Lookup_table
  2. All the matching Patient_ID with its corresponding Date_Supplied should be filtered in Main_table
  3. All Date_Supplied thereafter for that Patient_ID should be filtered in Main_table

Solution using tidyverse package would be awesome!

Main_table:

library(tidyverse)
library(datapasta)

Main_table <- tibble::tribble(
  ~Patient_ID, ~Date_Supplied,
           1L,    "2/10/2015",
           1L,    "3/10/2015",
           1L,    "4/10/2015",
           2L,    "5/10/2015",
           2L,    "6/10/2015",
           2L,    "7/10/2015",
           2L,    "8/10/2015",
           2L,    "9/10/2015",
           3L,    "1/12/2017",
           3L,    "2/12/2017",
           3L,    "3/12/2017",
           4L,    "1/10/2018",
           4L,    "2/10/2018",
           4L,    "3/10/2018"
  )

Lookup_table:

Lookup_table <- tibble::tribble(
                   ~Patient_ID, ~Date_Supplied,
                            1L,    "2/10/2015",
                            2L,    "7/10/2015",
                            4L,    "3/10/2018"
                   )

Expected Output:

Expected_output <- tibble::tribble(
                     ~Patient_ID, ~Date_Supplied,
                              1L,    "2/10/2015",
                              1L,    "3/10/2015",
                              1L,    "4/10/2015",
                              2L,    "7/10/2015",
                              2L,    "8/10/2015",
                              2L,    "9/10/2015",
                              4L,    "3/10/2018"
                     )

Thank you

Hi, you can achieve what you want with left_join + a bit of dplyr:

suppressPackageStartupMessages(library(tidyverse))

Main_table <- tibble::tribble(
  ~Patient_ID, ~Date_Supplied,
  1L,    "2/10/2015",
  1L,    "3/10/2015",
  1L,    "4/10/2015",
  2L,    "5/10/2015",
  2L,    "6/10/2015",
  2L,    "7/10/2015",
  2L,    "8/10/2015",
  2L,    "9/10/2015",
  3L,    "1/12/2017",
  3L,    "2/12/2017",
  3L,    "3/12/2017",
  4L,    "1/10/2018",
  4L,    "2/10/2018",
  4L,    "3/10/2018"
)

Lookup_table <- tibble::tribble(
  ~Patient_ID, ~Date_Supplied,
  1L,    "2/10/2015",
  2L,    "7/10/2015",
  4L,    "3/10/2018"
)

Expected_output <- tibble::tribble(
  ~Patient_ID, ~Date_Supplied,
  1L,    "2/10/2015",
  1L,    "3/10/2015",
  1L,    "4/10/2015",
  2L,    "7/10/2015",
  2L,    "8/10/2015",
  2L,    "9/10/2015",
  4L,    "3/10/2018"
) %>%
  dplyr::mutate(Date_Supplied = as.Date(Date_Supplied, format = "%d/%m/%Y", tz = "UTC"))

res <- dplyr::left_join(Main_table, Lookup_table, by = "Patient_ID") %>%
  dplyr::mutate_if(is.character, lubridate::as_date, format = "%d/%m/%Y", tz = "UTC") %>%
  dplyr::filter(!is.na(Date_Supplied.y)) %>%
  dplyr::filter(Date_Supplied.x >= Date_Supplied.y) %>%
  dplyr::select(-Date_Supplied.y, Date_Supplied = Date_Supplied.x) 

dplyr::all_equal(res, Expected_output)
#> [1] TRUE

Created on 2018-10-16 by the reprex package (v0.2.1)

2 Likes

Nice!!!! This was the missing link for my problem: filter(Date_Supplied.x >= Date_Supplied.y)

Thank you @mishabalyasin for your help.

1 Like