I have two data frame, this is just a sample , database have approx 1 million of records.
data1<-data.frame(
'External ID' = c(86364,"ARV_2612","AGH_2212","IND_2622","CHG_2622"),
sector = c(3,3,1,2,5),
col1=c(1,1,0,0,0),
'Enternal code'=c(1,1,1,1,3),
col3=c(1,1,0,0,0),
col4=c(1,0,0,0,0),
col5=c(1,0,1,1,1)
)
data2<-data.frame(
'External ID' = c(53265,"ARV_7362",76354,"IND_2622","CHG_9762"),
sector = c(3,3,1,2,5),
col1=c(1,1,0,0,0),
'Enternal code'=c(1,1,1,1,3),
col3=c(1,1,0,0,0),
col4=c(1,0,0,0,0),
col5=c(1,0,1,1,1)
)
new to R,now i am looking for a approach to mutate my one data frame (data2) like. the code while will automatically find the column "External ID", then add a new column in data2 like duplicate and this column will show Y or N if the external_id in data2 present in data1(external ID) any simple solution....?? the output should be like
External.ID | sector | col1 | Enternal.code | col3 | col4 | col5 | duplicate |
---|---|---|---|---|---|---|---|
53265 | 3 | 1 | 1 | 1 | 1 | 1 | N |
ARV_7362 | 3 | 1 | 1 | 1 | 0 | 0 | N |
76354 | 1 | 0 | 1 | 0 | 0 | 1 | N |
IND_2622 | 2 | 0 | 1 | 0 | 0 | 1 | Y |
CHG_9762 | 5 | 0 | 3 | 0 | 0 | 1 | N |