See the FAQ: How to do a minimal reproducible example reprex for beginners. Being able to cut-and-paste a data frame is an advantage to attracting answers.
Every R problem can be thought of with advantage as the interaction of three objects— an existing object, x , a desired object,y , and a function, f, that will return a value of y given x as an argument. In other words, school algebra— f(x) = y. Any of the objects can be composites.
In this case, PersonDedupe2 (which I'll refer to as dat) plays the role of x. It's a matrix-like object (a two-dimensional array of all numeric values) that may be of class matrix, data.frame or tibble.
y is the desired output, which I'll refer to as out, which will also be a matrix-like object consisting of 5 variables—one for PersonID (which I'll refer to as id) and one for each of feb, mar, apr and may.
f is the function to be composed that populates out. Let's start with a single variable in dat, February.
The first row of dat illustrates one outcome—the current and prior months are both zero, which is an unprovided for case in the problem description, a subject with no recorded activity in the past or current month. I'll encode that case as NA.
The second row records a subject with non-zero activity in both the current and prior month, so that can be encoded as existing if the activity is equal each month, otherwise changed.
Not shown for February is the case in which the current month is 1 or more and the prior month is 0, so that can be encoded as new.
With that in mind, the following code tests each pair of column vectors and sets the appropriate encoding (user to verify) in a new data frame. The code is hardwired, which is poor practice when programming for a recurring need—object names, column positions and encoding should all be parameterized.
# for reproducibility
set.seed(137)
# assortment of integers
basket <- c(rep(0,50),rep(1:4,15))
# simulated data
dat <- data.frame(id = sample(30000:40000,25),
Jan = sample(basket,25),
Feb = sample(basket,25),
Mar = sample(basket,25),
Apr = sample(basket,25),
May = sample(basket,25))
# receiver data frame same id value as dat; otherwise all NA
out <- data.frame(id = dat[,1],
dummy = rep(NA,25), # convenience for ease of indexing
feb = rep(NA,25),
mar = rep(NA,25),
apr = rep(NA,25),
may = rep(NA,25))
# gets row indices
get_row_indices <- function(x) {
nas = which(dat[,x] == 0 & dat[,x-1] == 0)
news = which(dat[,x] > 0 & dat[,x-1] == 0)
ended = which(dat[,x] == 0 & dat[,x-1] > 0)
changed = which(dat[,x] > 0 & dat[,x-1] > 0)
return(list(nas,news,ended,changed))
}
# uses row indices to change the out data frame
populate_out <- function(x) {
idx = get_row_indices(x)
out[idx[[1]],x] = NA
out[idx[[2]],x] = "new"
out[idx[[3]],x] = "exited"
out[idx[[4]],x] = "changed"
return(out)
}
# loops over the month columns after January
for (i in 3:6) out = populate_out(i)
# removes dummy column
out <- out[,-2]
out
#> id feb mar apr may
#> 1 39786 exited new exited <NA>
#> 2 30892 changed exited new changed
#> 3 37534 <NA> new exited <NA>
#> 4 31446 exited new changed exited
#> 5 37802 exited <NA> new changed
#> 6 30220 new changed exited new
#> 7 33562 new exited <NA> <NA>
#> 8 38098 new changed changed changed
#> 9 32171 <NA> <NA> new changed
#> 10 35609 exited <NA> <NA> new
#> 11 30064 changed changed changed exited
#> 12 37915 new exited new changed
#> 13 31437 <NA> new changed changed
#> 14 31583 exited new changed changed
#> 15 38205 changed exited <NA> new
#> 16 33307 exited new exited <NA>
#> 17 33459 exited <NA> new exited
#> 18 38069 exited <NA> <NA> new
#> 19 34731 exited <NA> new exited
#> 20 38152 <NA> new changed exited
#> 21 30415 changed exited new changed
#> 22 35516 changed changed exited new
#> 23 39822 changed changed changed exited
#> 24 31037 changed changed exited <NA>
#> 25 33234 <NA> new exited new