Select with exceptions plus pivot

Slavek · March 28, 2022, 1:55pm

Hi,
I have this simple df:

source <- data.frame(
   stringsAsFactors = FALSE,
                              URN = c("21GB01293040","21GB01240221",
                                      "21GB03294610","21GB01309069","21GB03078286",
                                      "21GB01086060","21GB01169525","21GB01209144",
                                      "21GB01204925","21GB90063865",
                                      "21GB01068838","21GB01176411","21GB01215531"),
   Q5_1 = c(0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1),
   Q5_2 = c(0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0),
   Q5_Top2 = c(0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0),
   Q5_ThreeLevels = c(1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0),
               QA31 = c(7, 1, 8, 3, 8, 8, 8, 6, 2, 2, 7, 8, 8),
  Qtr = c(1, 1, 2, 2, 1, 2, 2, 1, 2, 2, 1, 2, 2)
              )

Q5_1 and Q5_2 are multi-response questions and I would like to take only them into account preparing a table with Count and Proportion. I have prepared this:

library(dplyr)
idea <- source %>% 
  select(starts_with("Q5_") & (!ends_with("Levels") | !ends_with("Top2"))) %>% 
  summarise_all(list(Count = sum, Proportion = mean)) %>% 
  pivot_longer(everything(), names_to = c("Category", "summary"), names_sep = "_", "value") %>% 
  pivot_wider(names_from = summary, values_from = value)

but I have the following error:

Warning messages:
1: Expected 2 pieces. Additional pieces discarded in 8 rows [1, 2, 3, 4, 5, 6, 7, 8]. 
2: Values are not uniquely identified; output will contain list-cols.
* Use `values_fn = list` to suppress this warning.
* Use `values_fn = length` to identify where the duplicates arise
* Use `values_fn = {summary_fun}` to summarise duplicates

What am I doing wrong?
I have done something similar and all works quite smoothly:

source <- data.frame(
   stringsAsFactors = FALSE,
                              URN = c("21GB01293040","21GB01240221",
                                      "21GB03294610","21GB01309069","21GB03078286",
                                      "21GB01086060","21GB01169525","21GB01209144",
                                      "21GB01204925","21GB90063865",
                                      "21GB01068838","21GB01176411","21GB01215531"),
  TMC.Communication = c(0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1),
      TMC.Paperwork = c(0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0),
       TMC.Electric = c(0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0),
          TMC.Other = c(1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0),
                QA1 = c(9, 1, 8, 4, 8, 6, 7, 6, 2, 3, 6, 10, 8),
               QA31 = c(7, 1, 8, 3, 8, 8, 8, 6, 2, 2, 7, 8, 8),
  Qtr = c(1, 1, 2, 2, 1, 2, 2, 1, 2, 2, 1, 2, 2)
              )

library(dplyr)
library(tidyr)
idea <- source %>% 
  select(starts_with("TMC.")) %>% 
  summarise_all(list(Count = sum, Proportion = mean)) %>% 
  pivot_longer(everything(), names_to = c("Category", "summary"), names_sep = "_", "value") %>% 
  pivot_wider(names_from = summary, values_from = value)

What am I doing wrong?

nirgrahamuk · March 29, 2022, 10:11am

your code that manipulates the names seperated with _ seems to assume 2 levels of info, when there are 3. i added the third as 'more'

source %>% 
    select(starts_with("Q5_") & (!ends_with("Levels") | !ends_with("Top2"))) %>% 
    summarise_all(list(Count = sum, Proportion = mean)) %>% 
    pivot_longer(everything(), names_to = c("Category", "summary","more"), names_sep = "_", "value")  %>% 
    pivot_wider(names_from = summary, values_from = value)

Slavek · March 29, 2022, 10:28am

Thank you.
Now, I am trying to add groupping but it is not working:

idea2 <- source %>% 
  select(Qtr, (starts_with("Q5_") & (!ends_with("Levels") | !ends_with("Top2")))) %>% 
  group_by(Qtr) %>% 
  summarise_all(list(Count = sum, Proportion = mean)) %>% 
  pivot_longer(everything(), names_to = c("Category", "summary","more"), names_sep = "_", "value")  %>% 
  pivot_wider(names_from = summary, values_from = value)

Warning messages:
1: Expected 3 pieces. Missing pieces filled with `NA` in 1 rows [1]. 
2: Values are not uniquely identified; output will contain list-cols.
* Use `values_fn = list` to suppress this warning.
* Use `values_fn = length` to identify where the duplicates arise
* Use `values_fn = {summary_fun}` to summarise duplicates

Do you think that "more" impact the code?

nirgrahamuk · March 29, 2022, 10:34am

since you are grouping on Qtr, it is no longer appropriate to pivot_longer on everything, rather
pivot_longer on everything except Qtr
i.e.
pivot_longer(-Qtr,

Slavek · March 29, 2022, 10:55am

Hi, sorry but I think the code (or my misinterpretation) converted the table.
If you look my code with TMC, all variables starting from TMC are in the first column, count in the second and proportion in the third.
My example with Q5 (almost the same df) converts the table...
Basically, I need the same as this:

library(dplyr)
library(tidyr)
idea <- source %>% 
  select(starts_with("TMC.")) %>% 
  summarise_all(list(Count = sum, Proportion = mean)) %>% 
  pivot_longer(everything(), names_to = c("Category", "summary"), names_sep = "_", "value") %>% 
  pivot_wider(names_from = summary, values_from = value)

but with this:

select(starts_with("Q5_") & (!ends_with("Levels") | !ends_with("Top2"))) %>%

in two versions:

Overall
Grouped by Qtr

andresrcs · March 30, 2022, 12:53am

Is this what you are trying to accomplish?

library(tidyverse)

source <- data.frame(
    stringsAsFactors = FALSE,
    URN = c("21GB01293040","21GB01240221",
            "21GB03294610","21GB01309069","21GB03078286",
            "21GB01086060","21GB01169525","21GB01209144",
            "21GB01204925","21GB90063865",
            "21GB01068838","21GB01176411","21GB01215531"),
    Q5_1 = c(0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1),
    Q5_2 = c(0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0),
    Q5_Top2 = c(0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0),
    Q5_ThreeLevels = c(1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0),
    QA31 = c(7, 1, 8, 3, 8, 8, 8, 6, 2, 2, 7, 8, 8),
    Qtr = c(1, 1, 2, 2, 1, 2, 2, 1, 2, 2, 1, 2, 2)
)

source %>% 
    select(matches("^Q5_\\d")) %>% 
    summarise(across(.fns = list(Count = sum, Proportion = mean))) %>% 
    pivot_longer(everything(),
                 names_to = c("Category", "summary"),
                 names_sep = "(?<=_\\d)_",
                 values_to = "value") %>% 
    pivot_wider(names_from = summary,
                values_from = value)
#> # A tibble: 2 × 3
#>   Category Count Proportion
#>   <chr>    <dbl>      <dbl>
#> 1 Q5_1         5      0.385
#> 2 Q5_2         4      0.308

^{Created on 2022-03-29 by the reprex package (v2.0.1)}

If so, the problem is the way you are using the names_sep argument

system · April 6, 2022, 12:14pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.