Hi.
I need help removing duplicate row except certain row. My large data is similiar to the following:
Animal<-c("Unknow","Dog","cat","Lion","unknow","horse","dog","unknow","cat")
A_date<-c("12-08-2020","20-06-2018","01-01-2015","10-07-2021","15-08-2019","05-08-2013","15-11-2016","22-03-2022","15-05-2019")
Mydata<-data.frame(Animal, A_date)
Mydata
Animal A_date
1 Unknow 12-08-2020
2 Dog 20-06-2018
3 cat 01-01-2015
4 Lion 10-07-2021
5 unknow 15-08-2019
6 horse 05-08-2013
7 dog 15-11-2016
8 unknow 22-03-2022
9 cat 15-05-2019
If the column Animal has duplicate row then it should be remove based on the date. For example in my data dog appear twice and I want to keep the row with the oldest date.
While removing the duplicate row I want to keep all the row with name Unknow even if it appears multiple time.
This is what I have tried so fare
library(data.table)
#lldata$D_soum<-as.Date(alldata$D_soum)
test1<-setDT(Mydata)[order(Animal, -as.IDate(A_date, "%Y-%m-%d"))][!duplicated(Animal)]
And the result is not what I'm excepting.
How can I do this?
Thanks in advance