how to delete duplicates ?

Bishwaraj_Deo · May 7, 2023, 3:52am

delete duplicates from column.
no_duplicates <- merged_data[!duplicated(merged_data),] - This is not helping
Make a new variable called "drug_in_bcx_window" which is 1 if the drug was given in the 2 day window and 0 otherwise.

DavoWW · May 7, 2023, 5:48am

Hi @Bishwaraj_Deo
I converted your PDF data into a reprex.
Making the new column is straightforward, but there are no duplicate rows in this dataframe.

suppressPackageStartupMessages(library(tidyverse))

dat <- read.table(header=TRUE, sep=" ", text="
patient_id day_given antibiotic_type route
1 2 ciprofloxacin IV
1 4 ciprofloxacin IV
1 6 ciprofloxacin IV
1 7 doxycycline IV
1 9 doxycycline IV
1 15 penicillin IV
1 16 doxycycline IV
1 18 ciprofloxacin IV
8 1 doxycycline PO
8 2 penicillin IV
8 3 doxycycline IV
8 6 doxycycline PO
8 8 penicillin PO
8 12 penicillin IV
9 8 doxycycline IV
9 12 doxycycline PO
12 4 doxycycline PO
12 9 doxycycline IV
16 1 doxycycline IV
16 4 amoxicillin IV
19 3 doxycycline PO
19 5 amoxicillin IV
19 6 ciprofloxacin IV
19 10 doxycycline IV
19 12 penicillin IV
23 1 doxycycline IV
23 1 penicillin IV
23 3 amoxicillin IV
23 3 ciprofloxacin IV
23 3 doxycycline IV
23 4 doxycycline IV
23 5 ciprofloxacin PO
23 5 doxycycline IV
23 6 doxycycline IV
23 6 doxycycline PO
23 8 amoxicillin IV
23 9 doxycycline PO
23 10 amoxicillin PO
23 10 doxycycline IV
23 10 penicillin PO
23 11 doxycycline PO
23 12 doxycycline IV
23 14 doxycycline IV
23 14 penicillin IV
23 15 doxycycline PO
23 16 ciprofloxacin IV
40 2 amoxicillin PO
40 2 doxycycline IV
40 2 penicillin IV
")

# Create new column
dat %>% 
  mutate(drug_in_bcx_window = ifelse(day_given <= 2, 1, 0)) -> dat2
dat2
#>    patient_id day_given antibiotic_type route drug_in_bcx_window
#> 1           1         2   ciprofloxacin    IV                  1
#> 2           1         4   ciprofloxacin    IV                  0
#> 3           1         6   ciprofloxacin    IV                  0
#> 4           1         7     doxycycline    IV                  0
#> 5           1         9     doxycycline    IV                  0
#> 6           1        15      penicillin    IV                  0
#> 7           1        16     doxycycline    IV                  0
#> 8           1        18   ciprofloxacin    IV                  0
#> 9           8         1     doxycycline    PO                  1
#> 10          8         2      penicillin    IV                  1
#> 11          8         3     doxycycline    IV                  0
#> 12          8         6     doxycycline    PO                  0
#> 13          8         8      penicillin    PO                  0
#> 14          8        12      penicillin    IV                  0
#> 15          9         8     doxycycline    IV                  0
#> 16          9        12     doxycycline    PO                  0
#> 17         12         4     doxycycline    PO                  0
#> 18         12         9     doxycycline    IV                  0
#> 19         16         1     doxycycline    IV                  1
#> 20         16         4     amoxicillin    IV                  0
#> 21         19         3     doxycycline    PO                  0
#> 22         19         5     amoxicillin    IV                  0
#> 23         19         6   ciprofloxacin    IV                  0
#> 24         19        10     doxycycline    IV                  0
#> 25         19        12      penicillin    IV                  0
#> 26         23         1     doxycycline    IV                  1
#> 27         23         1      penicillin    IV                  1
#> 28         23         3     amoxicillin    IV                  0
#> 29         23         3   ciprofloxacin    IV                  0
#> 30         23         3     doxycycline    IV                  0
#> 31         23         4     doxycycline    IV                  0
#> 32         23         5   ciprofloxacin    PO                  0
#> 33         23         5     doxycycline    IV                  0
#> 34         23         6     doxycycline    IV                  0
#> 35         23         6     doxycycline    PO                  0
#> 36         23         8     amoxicillin    IV                  0
#> 37         23         9     doxycycline    PO                  0
#> 38         23        10     amoxicillin    PO                  0
#> 39         23        10     doxycycline    IV                  0
#> 40         23        10      penicillin    PO                  0
#> 41         23        11     doxycycline    PO                  0
#> 42         23        12     doxycycline    IV                  0
#> 43         23        14     doxycycline    IV                  0
#> 44         23        14      penicillin    IV                  0
#> 45         23        15     doxycycline    PO                  0
#> 46         23        16   ciprofloxacin    IV                  0
#> 47         40         2     amoxicillin    PO                  1
#> 48         40         2     doxycycline    IV                  1
#> 49         40         2      penicillin    IV                  1

# Look for possible duplicated whole rows
# See: https://www.statology.org/dplyr-find-duplicates/

# display all duplicate rows
dat2 %>% 
  arrange(patient_id, day_given, antibiotic_type, route) %>% 
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()
#> # A tibble: 0 × 5
#> # ℹ 5 variables: patient_id <int>, day_given <int>, antibiotic_type <chr>,
#> #   route <chr>, drug_in_bcx_window <dbl>

# i.e. there are no duplicated rows in this dataframe!

^{Created on 2023-05-07 with reprex v2.0.2}

Bishwaraj_Deo · May 7, 2023, 5:58am

actually i have updated the uploaded file. The file which you solved was uploaded by mistake.
please do check the newly attached file. Thank you !

DavoWW · May 7, 2023, 6:16am

Better if you provide us with a corrected reprex. Use dput(your_dataframe) for an easy way to depict your data in text format for inclusion here via cut-and-paste .

Bishwaraj_Deo · May 7, 2023, 6:28am

there are word limitations in posting in dput(), that is why i am unable to post.

Bishwaraj_Deo · May 7, 2023, 6:33am

	patient_id	antibiotic_type	route	last_administration_day	blood_culture_day
1	1	ciprofloxacin	IV	18	3
2	1	ciprofloxacin	IV	18	13
3	1	doxycycline	IV	16	3
4	1	doxycycline	IV	16	13
5	1	penicillin	IV	15	3
6	1	penicillin	IV	15	13
7	8	doxycycline	PO	6	13
8	8	doxycycline	PO	6	2
9	8	penicillin	IV	12	13
10	8	penicillin	IV	12	2
11	8	doxycycline	IV	3	13
12	8	doxycycline	IV	3	2
13	8	penicillin	PO	8	13
14	8	penicillin	PO	8	2
15	23	ciprofloxacin	IV	16	3
16	23	ciprofloxacin	PO	5	3
17	23	doxycycline	PO	15	3
18	23	amoxicillin	PO	10	3
19	23	penicillin	PO	10	3
20	23	doxycycline	IV	14	3
21	23	amoxicillin	IV	8	3
22	23	penicillin	IV	14	3
23	45	doxycycline	IV	10	9
24	45	doxycycline	IV	10	11
25	45	doxycycline	IV	10	4
26	45	doxycycline	PO	1	9
27	45	doxycycline	PO	1	11
28	45	doxycycline	PO	1	4
29	45	penicillin	IV	2	9
30	45	penicillin	IV	2	11
31	45	penicillin	IV	2	4
32	64	doxycycline	IV	19	1
33	64	amoxicillin	IV	18	1
34	64	ciprofloxacin	IV	18	1
35	66	doxycycline	IV	14	10
36	66	doxycycline	IV	14	9
37	66	penicillin	IV	3	10
38	66	penicillin	IV	3	9
39	66	doxycycline	PO	6	10
40	66	doxycycline	PO	6	9
41	69	doxycycline	IV	14	6
42	69	doxycycline	IV	14	16
43	69	doxycycline	IV	14	11
44	69	doxycycline	IV	14	7
45	69	doxycycline	IV	14	2
46	69	penicillin	IV	7	6
47	69	penicillin	IV	7	16
48	69	penicillin	IV	7	11
49	69	penicillin	IV	7	7
50	69	penicillin	IV	7	2
51	76	ciprofloxacin	IV	8	1
52	76	doxycycline	PO	6	1
53	76	penicillin	IV	5	1
54	76	amoxicillin	IV	4	1
55	76	doxycycline	IV	13	1
56	76	penicillin	PO	6	1
57	77	penicillin	IV	2	3
58	79	doxycycline	IV	12	11

DavoWW · May 7, 2023, 6:43am

Hi @Bishwaraj_Deo
It is tricky and tedious to extract your data for use in RStudio from either the posted PDF file or the table that you have pasted above. Have you already imported this data into RStudio/R? If yes, then just run dput(your_dataframe) and copy the output from the console pane into your reply here. If not, then post a text file of the data (e.g. CSV, or space-delimited). This makes it much easier for people who are willing to assist you.

system · June 18, 2023, 6:55am

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.