if statement to delete all subsequent rows after a time point

Meg23 · July 22, 2022, 8:01am

Hi,

I am very new to using R and am trying to delete all rows from a dataframe after a time point where a condition is met, so for a particular column I wish to delete all rows if one date column is greater than the other date column only if another column does not have "None" in that row.

I have tried the script:
newdf <- for(df$ID in df)
if(df$Endpoint != "None" && df$VISITDATE>df$DATE)
drop(df$ID)

This throws me an error -
df$Endpoint != "None" && df$VISITDATE > :
'length(x) = 17012 > 1' in coercion to 'logical(1)'

I really appreciate any help on this as have been on so many forums and no luck so far! (the above code I tried also may be very wrong as I am so new to all this)

Thank you.

jrkrideau · July 22, 2022, 10:59am

Hi, welcome to the forum.

I think we need to see some sample data. I'd suggest having a look at FAQ: How to do a minimal reproducible example ( reprex ) for beginners

A handy way to supply some sample data is the dput() function. In the case of a large dataset something like dput(head(mydata, 100)) should supply the data we need.

I have never encountered drop before but it is not doing what you think it is. See

?drop

At a quick guess you may want

subset(df, df$Endpoint == "None" & df$VISITDATE > df$DATE)

Meg23 · July 22, 2022, 11:17am

demodf

also typed out data frame below as upload is poor quality.

ID | visitdate | dateofdiagnosis | Illness
1 | 2000-03-12 | 2000-01-12 | Cancer
2 | 2013-12-05 | NA | None
3 | 2005-05-23 | 2013- 11-13 | Cancer
3 | 2017-11-22 | 2013-11-13 | Cancer

Hi,

Thank you for your help! I have uploaded above a demo df (Endpoint in above comment is now Illness column) - so through the code I am hoping to delete row 1 since the visitdate is later than the diagnosis date, leave in the second row since illness says "none", keep row 3 since visitdate is less than the date of diagnosis (so they weren't ill then), and delete row 4 which is the same ID but after the diagnosis. I hope this makes sense, so sorry!
I am open to use any script you may recommend, I tried the above through just scouting different forums for potential code.

jrkrideau · July 22, 2022, 12:41pm

Hi Meg,

Thanks for the data but probably the best way to supply sample data is by using the dput() function. It gives an exact copy of your data.

Anyway, is this what you want?

df  <- structure(list(ID = c(1, 2, 3, 4), visitdate = structure(c(11028, 
       16044, 12926, 17492), class = "Date"), dateofdiagnosis = structure(c(10968, 
       NA, 16022, 16022), class = "Date"), Illness = c("Cancer", "None", 
       "Cancer", "Cancer")), row.names = c(NA, 4L), class = "data.frame")


## convert  character variables to dates.dates
library(lubridate)

df$visitdate  <- ymd(df$visitdate)
dfateofdiagnosis  <- ymd(df$dateofdiagnosis)

## subset

df2  <- subset(df, df$visitdate >= df$dateofdiagnosis | is.na(dateofdiagnosis ))
df2

BTW 2013- 11-13 in your example has an extra space in it.

Meg23 · July 22, 2022, 1:06pm

Brilliant! Thank you so much that code worked and was much simpler than I was making it Just swapped around to get <= instead.

Thank you for the advice as well - I'm just picking up everything in R!

jrkrideau · July 22, 2022, 1:13pm

It looks like you have done coding before. Abandon all that you know! R is seriously weird if you are familiar with "normal" programming or even other stats packages.

system · August 12, 2022, 1:13pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.