I have a data frame like this
example=as.data.frame(cbind(x=c(0,0,1,1),x2=c('no','yes','yes','yes'),
x3=c('no','yes','yes','no'),x4=c(1,0,1,1))).
The data frame is of 9000000 rows and 380 columns(x1~x380).
Is there any efficient way to see which rows have only 0 or 'no'? Also, I hope to compute the proportion of 0 and 'no' for each row (in the above case, I want to create a new column: 0.75 0.50 0.00 0.25). I there any way to achieve these goals in one minute?
For mutate, I can only point you to the function case_when since I don't know your decision criteria.
My experience on a 4MB Macbook Air is that doing that many rows will take more than a minute or choke, less on an 8MB Dell under Ubuntu. I haven't tried that on my 32MB desktop.
Note that i have added stringsAsFactors = FALSE... I also added one row to your example, so you can have a column with only 0 and 'no' and see the difference in the output
Now you can get a vector indicating a logical value TRUE if the row only have 0 or 'no'
So the last row is the one that matches your criteria.
To calculate the proportion of 0 and 'no' of each row, we follow the same way, but now, we assign it to a new variable of example:
example$proportions <- apply(example, 1, function(x) mean(x == 0 | x == 'no'))
> example
x x2 x3 x4 proportions
1 0 no no 1 0.75
2 0 yes yes 0 0.50
3 1 yes yes 1 0.00
4 1 yes no 1 0.25
5 0 no no 0 1.00