Meg23
July 25, 2022, 10:53am
1
Hi,
I am trying to remove all rows from a dataframe based on a value in a column, but for each unique ID as shown below.
ID | Time | Data | Pregnancy
1 | 0 | 1 | Yes
1 | 1 | 0 | No
2 | 3 | 1 | No
3 | 0 | 1 | No
3 | 1 | 0 | Yes
3 | 4 | 1 | No
so it would show in the end:
ID | Time | Data | Pregnancy
2 | 3 | 1 | No
3 | 0 | 1 | No
So I am trying to ask, if df$Pregnancy = "Yes" then remove all rows for that same ID where Time is after or equal to the Time at Pregnancy.
ID <- c("1", "1", "2", "3", "3", "3")
Time <- c(0L, 1L, 3L, 0L, 1L, 4L)
data <- c("1", "0", "1", "0", "0", "1")
Pregnancy <- c("yes", "No", "No", "No", "Yes", "No")
df <- data.frame(ID, Time, data, Pregnancy)
I am stuck as to the code I would require to do this, and any help would be greatly appreciated as still very new to R!
Ven
July 25, 2022, 12:02pm
2
I've got 2 solutions for you.
dplyr
library(dplyr)
df |>
subset(preg == "No") |>
group_by(id) |>
slice_min(time)
data.table
library(data.table)
dt[preg == "No", .(time = min(time)), .(id, data, preg)][, .(id, time, data, preg)][]
I'd also suggest that you format your post so that it's more legible in the future.
Meg23
July 25, 2022, 12:24pm
3
Hi,
Thank you for your reply, I tried both your solutions and have formatted the data better now!
Using the dplyr code you suggested, the original:
ID Time data Pregnancy
1 1 0 1 yes
2 1 1 0 No
3 2 3 1 No
4 3 0 0 No
5 3 1 0 Yes
6 3 4 1 No
turned into:
# A tibble: 3 × 4
# Groups: ID [3]
ID Time data Pregnancy
<chr> <int> <chr> <chr>
1 1 1 0 No
2 2 3 1 No
3 3 0 0 No
So some worked correctly, whereas the row 1 should have also been removed, since that ID was pregnant at Time 0 and it has kept in the row for this ID (1) at Time 1 which is after the pregnancy. But the others seemed to have worked.
Also, the second data.table package gave me this error:
dt[Pregnancy == "No", .(Time = min(Time)), .(ID, data, Pregnancy)][, .(ID, Time, data, Pregnancy)][]
Error in .(Time = min(Time)) : could not find function "."
Thank you again.
Ven
July 25, 2022, 12:37pm
4
Meg23:
So some worked correctly, whereas the row 1 should have also been removed, since that ID was pregnant at Time 0 and it has kept in the row for this ID (1) at Time 1 which is after the pregnancy. But the others seemed to have worked.
If you're keeping with dplyr
, then you'd rearrange it like this;
df |>
group_by(id) |>
slice_min(time) |>
subset(preg == "No")
Meg23
July 25, 2022, 12:50pm
5
Brilliant, thank you so much that worked!
Ven
July 25, 2022, 1:23pm
6
Meg23:
dt[Pregnancy == "No", .(Time = min(Time)), .(ID, data, Pregnancy)][, .(ID, Time, data, Pregnancy)][]
Error in .(Time = min(Time)) : could not find function "."
I'm not entirely sure why it would return an error.
This may be a possible solution; r - Data table error could not find function "." - Stack Overflow
Regardless, this is how I would rewrite the data.table
code to match the outcome of the dplyr
code.
dt[, .SD[which.min(Time), .(Time, data, Pregnancy)], ID][Pregnancy == "No"][]
Meg23
July 25, 2022, 1:38pm
7
Thank you I appreciate your help!
1 Like
system
Closed
August 1, 2022, 1:38pm
8
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.