how to delete duplicates ?

merge_data.pdf (87.4 KB)

  1. delete duplicates from column.
    no_duplicates <- merged_data[!duplicated(merged_data),] - This is not helping

  2. Make a new variable called "drug_in_bcx_window" which is 1 if the drug was given in the 2 day window and 0 otherwise.

Hi @Bishwaraj_Deo
I converted your PDF data into a reprex.
Making the new column is straightforward, but there are no duplicate rows in this dataframe.

suppressPackageStartupMessages(library(tidyverse))

dat <- read.table(header=TRUE, sep=" ", text="
patient_id day_given antibiotic_type route
1 2 ciprofloxacin IV
1 4 ciprofloxacin IV
1 6 ciprofloxacin IV
1 7 doxycycline IV
1 9 doxycycline IV
1 15 penicillin IV
1 16 doxycycline IV
1 18 ciprofloxacin IV
8 1 doxycycline PO
8 2 penicillin IV
8 3 doxycycline IV
8 6 doxycycline PO
8 8 penicillin PO
8 12 penicillin IV
9 8 doxycycline IV
9 12 doxycycline PO
12 4 doxycycline PO
12 9 doxycycline IV
16 1 doxycycline IV
16 4 amoxicillin IV
19 3 doxycycline PO
19 5 amoxicillin IV
19 6 ciprofloxacin IV
19 10 doxycycline IV
19 12 penicillin IV
23 1 doxycycline IV
23 1 penicillin IV
23 3 amoxicillin IV
23 3 ciprofloxacin IV
23 3 doxycycline IV
23 4 doxycycline IV
23 5 ciprofloxacin PO
23 5 doxycycline IV
23 6 doxycycline IV
23 6 doxycycline PO
23 8 amoxicillin IV
23 9 doxycycline PO
23 10 amoxicillin PO
23 10 doxycycline IV
23 10 penicillin PO
23 11 doxycycline PO
23 12 doxycycline IV
23 14 doxycycline IV
23 14 penicillin IV
23 15 doxycycline PO
23 16 ciprofloxacin IV
40 2 amoxicillin PO
40 2 doxycycline IV
40 2 penicillin IV
")

# Create new column
dat %>% 
  mutate(drug_in_bcx_window = ifelse(day_given <= 2, 1, 0)) -> dat2
dat2
#>    patient_id day_given antibiotic_type route drug_in_bcx_window
#> 1           1         2   ciprofloxacin    IV                  1
#> 2           1         4   ciprofloxacin    IV                  0
#> 3           1         6   ciprofloxacin    IV                  0
#> 4           1         7     doxycycline    IV                  0
#> 5           1         9     doxycycline    IV                  0
#> 6           1        15      penicillin    IV                  0
#> 7           1        16     doxycycline    IV                  0
#> 8           1        18   ciprofloxacin    IV                  0
#> 9           8         1     doxycycline    PO                  1
#> 10          8         2      penicillin    IV                  1
#> 11          8         3     doxycycline    IV                  0
#> 12          8         6     doxycycline    PO                  0
#> 13          8         8      penicillin    PO                  0
#> 14          8        12      penicillin    IV                  0
#> 15          9         8     doxycycline    IV                  0
#> 16          9        12     doxycycline    PO                  0
#> 17         12         4     doxycycline    PO                  0
#> 18         12         9     doxycycline    IV                  0
#> 19         16         1     doxycycline    IV                  1
#> 20         16         4     amoxicillin    IV                  0
#> 21         19         3     doxycycline    PO                  0
#> 22         19         5     amoxicillin    IV                  0
#> 23         19         6   ciprofloxacin    IV                  0
#> 24         19        10     doxycycline    IV                  0
#> 25         19        12      penicillin    IV                  0
#> 26         23         1     doxycycline    IV                  1
#> 27         23         1      penicillin    IV                  1
#> 28         23         3     amoxicillin    IV                  0
#> 29         23         3   ciprofloxacin    IV                  0
#> 30         23         3     doxycycline    IV                  0
#> 31         23         4     doxycycline    IV                  0
#> 32         23         5   ciprofloxacin    PO                  0
#> 33         23         5     doxycycline    IV                  0
#> 34         23         6     doxycycline    IV                  0
#> 35         23         6     doxycycline    PO                  0
#> 36         23         8     amoxicillin    IV                  0
#> 37         23         9     doxycycline    PO                  0
#> 38         23        10     amoxicillin    PO                  0
#> 39         23        10     doxycycline    IV                  0
#> 40         23        10      penicillin    PO                  0
#> 41         23        11     doxycycline    PO                  0
#> 42         23        12     doxycycline    IV                  0
#> 43         23        14     doxycycline    IV                  0
#> 44         23        14      penicillin    IV                  0
#> 45         23        15     doxycycline    PO                  0
#> 46         23        16   ciprofloxacin    IV                  0
#> 47         40         2     amoxicillin    PO                  1
#> 48         40         2     doxycycline    IV                  1
#> 49         40         2      penicillin    IV                  1

# Look for possible duplicated whole rows
# See: https://www.statology.org/dplyr-find-duplicates/

# display all duplicate rows
dat2 %>% 
  arrange(patient_id, day_given, antibiotic_type, route) %>% 
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()
#> # A tibble: 0 × 5
#> # ℹ 5 variables: patient_id <int>, day_given <int>, antibiotic_type <chr>,
#> #   route <chr>, drug_in_bcx_window <dbl>

# i.e. there are no duplicated rows in this dataframe!

Created on 2023-05-07 with reprex v2.0.2

actually i have updated the uploaded file. The file which you solved was uploaded by mistake.
please do check the newly attached file. Thank you !

Better if you provide us with a corrected reprex. Use dput(your_dataframe) for an easy way to depict your data in text format for inclusion here via cut-and-paste .

there are word limitations in posting in dput(), that is why i am unable to post.

patient_id antibiotic_type route last_administration_day blood_culture_day
1 1 ciprofloxacin IV 18 3
2 1 ciprofloxacin IV 18 13
3 1 doxycycline IV 16 3
4 1 doxycycline IV 16 13
5 1 penicillin IV 15 3
6 1 penicillin IV 15 13
7 8 doxycycline PO 6 13
8 8 doxycycline PO 6 2
9 8 penicillin IV 12 13
10 8 penicillin IV 12 2
11 8 doxycycline IV 3 13
12 8 doxycycline IV 3 2
13 8 penicillin PO 8 13
14 8 penicillin PO 8 2
15 23 ciprofloxacin IV 16 3
16 23 ciprofloxacin PO 5 3
17 23 doxycycline PO 15 3
18 23 amoxicillin PO 10 3
19 23 penicillin PO 10 3
20 23 doxycycline IV 14 3
21 23 amoxicillin IV 8 3
22 23 penicillin IV 14 3
23 45 doxycycline IV 10 9
24 45 doxycycline IV 10 11
25 45 doxycycline IV 10 4
26 45 doxycycline PO 1 9
27 45 doxycycline PO 1 11
28 45 doxycycline PO 1 4
29 45 penicillin IV 2 9
30 45 penicillin IV 2 11
31 45 penicillin IV 2 4
32 64 doxycycline IV 19 1
33 64 amoxicillin IV 18 1
34 64 ciprofloxacin IV 18 1
35 66 doxycycline IV 14 10
36 66 doxycycline IV 14 9
37 66 penicillin IV 3 10
38 66 penicillin IV 3 9
39 66 doxycycline PO 6 10
40 66 doxycycline PO 6 9
41 69 doxycycline IV 14 6
42 69 doxycycline IV 14 16
43 69 doxycycline IV 14 11
44 69 doxycycline IV 14 7
45 69 doxycycline IV 14 2
46 69 penicillin IV 7 6
47 69 penicillin IV 7 16
48 69 penicillin IV 7 11
49 69 penicillin IV 7 7
50 69 penicillin IV 7 2
51 76 ciprofloxacin IV 8 1
52 76 doxycycline PO 6 1
53 76 penicillin IV 5 1
54 76 amoxicillin IV 4 1
55 76 doxycycline IV 13 1
56 76 penicillin PO 6 1
57 77 penicillin IV 2 3
58 79 doxycycline IV 12 11

Hi @Bishwaraj_Deo
It is tricky and tedious to extract your data for use in RStudio from either the posted PDF file or the table that you have pasted above. Have you already imported this data into RStudio/R? If yes, then just run dput(your_dataframe) and copy the output from the console pane into your reply here. If not, then post a text file of the data (e.g. CSV, or space-delimited). This makes it much easier for people who are willing to assist you.

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.