I have the following data from an MTurk study:
data.frame(
Random.ID = c(46392L,91734L,98884L,50989L,92380L,
32805L,85910L,83298L,28722L,60690L),
CRSBCIS = c(4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L),
CRSCAPE = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
CRSCAPE2 = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L),
CRSCMQ = c(11L, 11L, 11L, 11L, 11L, 11L, 10L, 11L, 11L, 11L),
CRSDemo = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
CRSDPB = c(8L, 8L, 8L, 8L, 8L, 8L, 10L, 8L, 8L, 8L),
CRSDUQ = c(3L, 3L, 3L, 3L, 3L, 3L, 5L, 3L, 3L, 3L),
CRSDUQ2 = c(2L, 2L, 2L, 2L, 2L, 2L, 6L, 2L, 2L, 2L),
CRSGCBS = c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L),
CRSIDI = c(13L, 13L, 13L, 13L, 7L, 13L, 13L, 13L, 13L, 13L),
CRSIDI2 = c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L),
CRSNFC = c(4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L),
CRSTSRQ = c(7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L)
)
Running studies on MTurk requires figuring out which participants are bots or are randomly responding vs. real/good effort data. The first column in this data frame is the participant ID. I need this preserved in a final data frame consisting of an ID column and a validity variable (example at the end), which could be coded as 0,1 or whatever makes it clear which data to toss and which participants to pay for real work vs. which to reject. Once I have this sorted out we are going to open the floodgates and run with hundreds of participants.
The other column variables come from a method of screening out bots/random responders using the Conscientious Responders Scale (if you're curious: https://journals.sagepub.com/doi/pdf/10.1177/2158244014545964)
Each question reads something like "To answer this question, choose "All of the above", which is coded as "4" in the case of the second variable in the data frame. Each questionnaire gets one or two of these depending on length. I need to create a new variable that will operationalize valid responding as >= ~80% correct responses across these variables (columns 2 through 14).
The correct answers to the variables, in order from 2 through 14 are: (4,1,2,1,1,8,3,2,3,13,3,4,7).
An example of my ideal final data frame would look something like this:
data.frame(
Random.ID = c(46392L,91734L,98884L,50989L,92380L,
32805L,85910L,83298L,28722L,60690L),
Valid = c(0L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 0L),
)
I think this can be done by creating a new empty variable and then within a loop checking to see if these variables are answered correctly, adding 1 to that variable if they are, moving on to the next variable and repeating this process for all columns. Then that number would be divided by 13. I'm not new to R, but I have very little experience writing loops and am not sure where to start.
Thank you in advance for any help!