I'm trying to find a more efficient way to run a function that counts the number of favorable responses across several columns for each row. As you can see in my reprex, right now I use a work around where I set a unique value for each row (using row_number), and then grouping by that value. While this works for this example set, it isn't very efficient, and is slow when I use on my real dataset of 25 variables and 200,000 cases.
Is there a more efficient way to get the same output? Maybe something with pmap? But I wasn't quite able to grasp how to do this, but I don't yet have experience in purrr. I'm at that stage in learning R where I can generally get the desired outcome, but not in the most efficient way.
library(tidyverse)
#> Warning: package 'stringr' was built under R version 3.4.4
#> Warning: package 'forcats' was built under R version 3.4.4
#simplfied example dataset
testing_by_row <-
structure(list(group = c("a", "a", "b", "b", "c", "c"), Q01 = structure(c(1L,
1L, 2L, 3L, 3L, NA), .Label = c("F", "U", "N"), class = "factor"),
Q02 = structure(c(2L, 1L, NA, 3L, 1L, 1L), .Label = c("F",
"U", "N"), class = "factor"), Q03 = structure(c(3L, 1L, 1L,
2L, 3L, 3L), .Label = c("F", "U", "N"), class = "factor"),
Q04 = structure(c(1L, 1L, 1L, 2L, 2L, 3L), .Label = c("F",
"U", "N"), class = "factor"), Q05 = structure(c(3L, 3L, NA,
3L, 1L, 2L), .Label = c("F", "U", "N"), class = "factor"),
Q06 = structure(c(2L, 2L, 1L, 3L, 1L, 2L), .Label = c("F",
"U", "N"), class = "factor")), .Names = c("group", "Q01",
"Q02", "Q03", "Q04", "Q05", "Q06"), class = c("tbl_df", "tbl",
"data.frame"), row.names = c(NA, -6L))
#Function to calc count fav for index
countF <- function(x) {
sum(x == 1, na.rm = TRUE)
}
countN <- function(x) {
sum(x == 1 | x == 2 | x == 3, na.rm = TRUE)
}
#Counts the #Fav for each index - and the total that answered Fav, Unfav, or Neutral. Note the questions in the index are not always consecutive (e.g. index 1 is Q01, Q02, and Q04).
testing_by_row %>%
mutate(Par_ID = row_number()) %>%
group_by(Par_ID) %>%
mutate(Index1_Fav = countF(c(Q01, Q02, Q04)),
Index1_N = countN(c(Q01, Q02, Q04)),
Index2_Fav = countF(c(Q05, Q06)),
Index2_N = countN(c(Q05, Q06)))
#> # A tibble: 6 x 12
#> # Groups: Par_ID [6]
#> group Q01 Q02 Q03 Q04 Q05 Q06 Par_ID Index1_Fav Index1_N
#> <chr> <fct> <fct> <fct> <fct> <fct> <fct> <int> <int> <int>
#> 1 a F U N F N U 1 2 3
#> 2 a F F F F N U 2 3 3
#> 3 b U <NA> F F <NA> F 3 1 2
#> 4 b N N U U N N 4 0 3
#> 5 c N F N U F F 5 1 3
#> 6 c <NA> F N N U U 6 1 2
#> # ... with 2 more variables: Index2_Fav <int>, Index2_N <int>
Created on 2018-03-27 by the reprex package (v0.2.0).