Taggin Duplicates

Dear All,

This is a very simple problem which I was able to solve very easily on Stata. I need to tag duplicate entries based on two variables.
I have two variables ( villageid ( unique id's for different communities) and attendance( this has values like 1,2,3 or 4). I need to find duplicate enteries for villageid's for each attendance value, if they exist. Is there a way I can tag them ?

Best,
Kumar Ashwarya

Hi,

Is this what you're looking for:

library("dplyr")
myData = data.frame(villageid = sample(1:10, 50, replace = T), attendance = sample(1:5, 50, replace = T))
duplicates = myData %>% group_by(villageid, attendance) %>% summarise(n() > 1)
myData = left_join(myData, duplicates, by = c("villageid", "attendance")) %>% arrange(villageid, attendance)

Grtz
PJ

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.