How to order & count the number of duplicates for each unique value in R?

d.arlia · May 24, 2021, 1:26pm

I am working with scraped data on the housing market, and I have a dataset with some duplicates, found conditionally on a set of characteristics. Suppose I have observations A, B, and C with all characteristics being equal except the rent value and the dates (with dateA<dateB<dateC), I consider observations B and C as duplicates, as the ad was just re-posted on the webpage a second and then a third time in order to be re-rented.

I have created a column that tells me if each observation is a duplicate. So I see a column with the standard sequence of "FALSE" "TRUE" "FALSE" "FALSE" "TRUE" values.

I would like to create another column that tells me, in correspondence to the "TRUE", which is its ordinality. For example, in my example, I would like in correspondence to observation B to have the value 2 and in correspondence of the obs C the value 3 and so on.

Is there a way to do so in R?

Thanks.

nirgrahamuk · May 24, 2021, 2:35pm

library(tidyverse)
(exmpldf <- tribble(~x1,~x2,~x3,~price,~datenum,~dupflag,
                   1,1,1,1000,1,TRUE,
                   1,1,1,2000,2,TRUE,
                   1,1,1,2100,3,TRUE,
                   2,2,2,1500,2,FALSE,
                   3,3,3, 900,4,FALSE,
                   4,4,4, 1300,2,TRUE,
                   4,4,4, 1350,4,TRUE))

group_by(exmpldf,
         x1,x2,x3) %>% mutate(
                          r=ifelse(dupflag,
                                   row_number(),
                                   NA))

system · June 14, 2021, 2:35pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.