I have a dataframe dfsales which I have converted into a long format : dfsaleslong.
dfsales <- data.frame(
subjectid = c("a","b","c","d","e","f","g","h","i","j","k"),
location = c("NY","NC","WA","WA","OR","CA","AR","KS","AZ","VT","MA"),
month1 = c(NA,NA,1,0,0,2,1,1,0,0,0),
month2 = c(NA,NA,0,0,0,0,NA,0,0,0,NA),
month3 = c(NA,1,0,1,0,0,0,1,NA,NA,NA),
month4 = c(0,0,0,0,0,1,2,0,1,NA,0),
month5 = c(NA,NA,NA,NA,NA,NA,0,1,1,2,0),
month6 = c(NA,NA,0,0,0,NA,NA,0,0,0,0),
month7 = c(0,0,0,0,0,0,NA,0,0,0,0),
goods1 = c(1,2,1,2,0,0,1,2,2,1,0),
goods2 = c(0,0,1,2,1,1,2,2,1,0,0),
goods3 = c(0,1,2,1,1,NA,2,1,2,1,NA),
goods4 = c(0,1,2,1,1,1,2,2,NA,NA,NA),
goods5 = c(0,1,0,1,1,1,2,2,1,NA,NA),
goods6 = c(0,1,2,1,1,1,2,2,0,0,0),
goods7 = c(NA,1,1,1,1,1,2,2,2,NA,NA),
complain1 = c(0,0,0,0,0,0,0,0,0,0,0),
complain2 = c(0,0,0,0,0,0,0,0,0,0,0),
complain3 = c(0,1,0,0,0,0,1,0,0,0,0),
complain4 = c(0,0,0,0,0,0,0,0,0,1,1),
complain5 = c(0,0,0,0,0,0,0,0,0,0,0),
complain6 = c(0,0,0,0,0,0,1,0,0,1,0),
complain7 = c(0,2,0,0,0,0,2,0,0,0,1))
library(tidyr)
#> Warning: package 'tidyr' was built under R version 3.6.2
dfsaleslong <- pivot_longer(data = dfsales,cols = month1:complain7,
names_pattern = "([^\\d]+)(\\d+)",
names_to = c(".value","month_number"))
Created on 2021-12-20 by the reprex package (v2.0.1)
the column "complain" has three values (0,1,2). I want to create a function where for any of the subject, if the value of complain is either 1 or 2 (which means they have received complain), I want to delete and ignore all the data/rows after that for that individual.
The data is such that for the subject if the value of complain = 1, then they can again be 0 in the next months. So I couldn't use the simple code of deleting rows that have the value of 1 or 2.
Could someone help me do this?
FYI this is how I want my final data to look like (created manually)
dfsaleslong2 <- dfsaleslong[-c(10:14,45:49,67:70,74:77),]