Hi @Bishwaraj_Deo
I converted your PDF data into a reprex.
Making the new column is straightforward, but there are no duplicate rows in this dataframe.
suppressPackageStartupMessages(library(tidyverse))
dat <- read.table(header=TRUE, sep=" ", text="
patient_id day_given antibiotic_type route
1 2 ciprofloxacin IV
1 4 ciprofloxacin IV
1 6 ciprofloxacin IV
1 7 doxycycline IV
1 9 doxycycline IV
1 15 penicillin IV
1 16 doxycycline IV
1 18 ciprofloxacin IV
8 1 doxycycline PO
8 2 penicillin IV
8 3 doxycycline IV
8 6 doxycycline PO
8 8 penicillin PO
8 12 penicillin IV
9 8 doxycycline IV
9 12 doxycycline PO
12 4 doxycycline PO
12 9 doxycycline IV
16 1 doxycycline IV
16 4 amoxicillin IV
19 3 doxycycline PO
19 5 amoxicillin IV
19 6 ciprofloxacin IV
19 10 doxycycline IV
19 12 penicillin IV
23 1 doxycycline IV
23 1 penicillin IV
23 3 amoxicillin IV
23 3 ciprofloxacin IV
23 3 doxycycline IV
23 4 doxycycline IV
23 5 ciprofloxacin PO
23 5 doxycycline IV
23 6 doxycycline IV
23 6 doxycycline PO
23 8 amoxicillin IV
23 9 doxycycline PO
23 10 amoxicillin PO
23 10 doxycycline IV
23 10 penicillin PO
23 11 doxycycline PO
23 12 doxycycline IV
23 14 doxycycline IV
23 14 penicillin IV
23 15 doxycycline PO
23 16 ciprofloxacin IV
40 2 amoxicillin PO
40 2 doxycycline IV
40 2 penicillin IV
")
# Create new column
dat %>%
mutate(drug_in_bcx_window = ifelse(day_given <= 2, 1, 0)) -> dat2
dat2
#> patient_id day_given antibiotic_type route drug_in_bcx_window
#> 1 1 2 ciprofloxacin IV 1
#> 2 1 4 ciprofloxacin IV 0
#> 3 1 6 ciprofloxacin IV 0
#> 4 1 7 doxycycline IV 0
#> 5 1 9 doxycycline IV 0
#> 6 1 15 penicillin IV 0
#> 7 1 16 doxycycline IV 0
#> 8 1 18 ciprofloxacin IV 0
#> 9 8 1 doxycycline PO 1
#> 10 8 2 penicillin IV 1
#> 11 8 3 doxycycline IV 0
#> 12 8 6 doxycycline PO 0
#> 13 8 8 penicillin PO 0
#> 14 8 12 penicillin IV 0
#> 15 9 8 doxycycline IV 0
#> 16 9 12 doxycycline PO 0
#> 17 12 4 doxycycline PO 0
#> 18 12 9 doxycycline IV 0
#> 19 16 1 doxycycline IV 1
#> 20 16 4 amoxicillin IV 0
#> 21 19 3 doxycycline PO 0
#> 22 19 5 amoxicillin IV 0
#> 23 19 6 ciprofloxacin IV 0
#> 24 19 10 doxycycline IV 0
#> 25 19 12 penicillin IV 0
#> 26 23 1 doxycycline IV 1
#> 27 23 1 penicillin IV 1
#> 28 23 3 amoxicillin IV 0
#> 29 23 3 ciprofloxacin IV 0
#> 30 23 3 doxycycline IV 0
#> 31 23 4 doxycycline IV 0
#> 32 23 5 ciprofloxacin PO 0
#> 33 23 5 doxycycline IV 0
#> 34 23 6 doxycycline IV 0
#> 35 23 6 doxycycline PO 0
#> 36 23 8 amoxicillin IV 0
#> 37 23 9 doxycycline PO 0
#> 38 23 10 amoxicillin PO 0
#> 39 23 10 doxycycline IV 0
#> 40 23 10 penicillin PO 0
#> 41 23 11 doxycycline PO 0
#> 42 23 12 doxycycline IV 0
#> 43 23 14 doxycycline IV 0
#> 44 23 14 penicillin IV 0
#> 45 23 15 doxycycline PO 0
#> 46 23 16 ciprofloxacin IV 0
#> 47 40 2 amoxicillin PO 1
#> 48 40 2 doxycycline IV 1
#> 49 40 2 penicillin IV 1
# Look for possible duplicated whole rows
# See: https://www.statology.org/dplyr-find-duplicates/
# display all duplicate rows
dat2 %>%
arrange(patient_id, day_given, antibiotic_type, route) %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()
#> # A tibble: 0 × 5
#> # ℹ 5 variables: patient_id <int>, day_given <int>, antibiotic_type <chr>,
#> # route <chr>, drug_in_bcx_window <dbl>
# i.e. there are no duplicated rows in this dataframe!
Better if you provide us with a corrected reprex. Use dput(your_dataframe) for an easy way to depict your data in text format for inclusion here via cut-and-paste .
Hi @Bishwaraj_Deo
It is tricky and tedious to extract your data for use in RStudio from either the posted PDF file or the table that you have pasted above. Have you already imported this data into RStudio/R? If yes, then just run dput(your_dataframe) and copy the output from the console pane into your reply here. If not, then post a text file of the data (e.g. CSV, or space-delimited). This makes it much easier for people who are willing to assist you.