Select with exceptions plus pivot

Hi,
I have this simple df:

source <- data.frame(
   stringsAsFactors = FALSE,
                              URN = c("21GB01293040","21GB01240221",
                                      "21GB03294610","21GB01309069","21GB03078286",
                                      "21GB01086060","21GB01169525","21GB01209144",
                                      "21GB01204925","21GB90063865",
                                      "21GB01068838","21GB01176411","21GB01215531"),
   Q5_1 = c(0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1),
   Q5_2 = c(0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0),
   Q5_Top2 = c(0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0),
   Q5_ThreeLevels = c(1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0),
               QA31 = c(7, 1, 8, 3, 8, 8, 8, 6, 2, 2, 7, 8, 8),
  Qtr = c(1, 1, 2, 2, 1, 2, 2, 1, 2, 2, 1, 2, 2)
              )

Q5_1 and Q5_2 are multi-response questions and I would like to take only them into account preparing a table with Count and Proportion. I have prepared this:

library(dplyr)
idea <- source %>% 
  select(starts_with("Q5_") & (!ends_with("Levels") | !ends_with("Top2"))) %>% 
  summarise_all(list(Count = sum, Proportion = mean)) %>% 
  pivot_longer(everything(), names_to = c("Category", "summary"), names_sep = "_", "value") %>% 
  pivot_wider(names_from = summary, values_from = value)

but I have the following error:

Warning messages:
1: Expected 2 pieces. Additional pieces discarded in 8 rows [1, 2, 3, 4, 5, 6, 7, 8]. 
2: Values are not uniquely identified; output will contain list-cols.
* Use `values_fn = list` to suppress this warning.
* Use `values_fn = length` to identify where the duplicates arise
* Use `values_fn = {summary_fun}` to summarise duplicates 

What am I doing wrong?
I have done something similar and all works quite smoothly:

source <- data.frame(
   stringsAsFactors = FALSE,
                              URN = c("21GB01293040","21GB01240221",
                                      "21GB03294610","21GB01309069","21GB03078286",
                                      "21GB01086060","21GB01169525","21GB01209144",
                                      "21GB01204925","21GB90063865",
                                      "21GB01068838","21GB01176411","21GB01215531"),
  TMC.Communication = c(0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1),
      TMC.Paperwork = c(0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0),
       TMC.Electric = c(0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0),
          TMC.Other = c(1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0),
                QA1 = c(9, 1, 8, 4, 8, 6, 7, 6, 2, 3, 6, 10, 8),
               QA31 = c(7, 1, 8, 3, 8, 8, 8, 6, 2, 2, 7, 8, 8),
  Qtr = c(1, 1, 2, 2, 1, 2, 2, 1, 2, 2, 1, 2, 2)
              )

library(dplyr)
library(tidyr)
idea <- source %>% 
  select(starts_with("TMC.")) %>% 
  summarise_all(list(Count = sum, Proportion = mean)) %>% 
  pivot_longer(everything(), names_to = c("Category", "summary"), names_sep = "_", "value") %>% 
  pivot_wider(names_from = summary, values_from = value)

What am I doing wrong?

your code that manipulates the names seperated with _ seems to assume 2 levels of info, when there are 3. i added the third as 'more'

source %>% 
    select(starts_with("Q5_") & (!ends_with("Levels") | !ends_with("Top2"))) %>% 
    summarise_all(list(Count = sum, Proportion = mean)) %>% 
    pivot_longer(everything(), names_to = c("Category", "summary","more"), names_sep = "_", "value")  %>% 
    pivot_wider(names_from = summary, values_from = value)

Thank you.
Now, I am trying to add groupping but it is not working:

idea2 <- source %>% 
  select(Qtr, (starts_with("Q5_") & (!ends_with("Levels") | !ends_with("Top2")))) %>% 
  group_by(Qtr) %>% 
  summarise_all(list(Count = sum, Proportion = mean)) %>% 
  pivot_longer(everything(), names_to = c("Category", "summary","more"), names_sep = "_", "value")  %>% 
  pivot_wider(names_from = summary, values_from = value)

Warning messages:
1: Expected 3 pieces. Missing pieces filled with `NA` in 1 rows [1]. 
2: Values are not uniquely identified; output will contain list-cols.
* Use `values_fn = list` to suppress this warning.
* Use `values_fn = length` to identify where the duplicates arise
* Use `values_fn = {summary_fun}` to summarise duplicates

Do you think that "more" impact the code?

since you are grouping on Qtr, it is no longer appropriate to pivot_longer on everything, rather
pivot_longer on everything except Qtr
i.e.
pivot_longer(-Qtr,

1 Like

Hi, sorry but I think the code (or my misinterpretation) converted the table.
If you look my code with TMC, all variables starting from TMC are in the first column, count in the second and proportion in the third.
My example with Q5 (almost the same df) converts the table...
Basically, I need the same as this:

library(dplyr)
library(tidyr)
idea <- source %>% 
  select(starts_with("TMC.")) %>% 
  summarise_all(list(Count = sum, Proportion = mean)) %>% 
  pivot_longer(everything(), names_to = c("Category", "summary"), names_sep = "_", "value") %>% 
  pivot_wider(names_from = summary, values_from = value)

but with this:

select(starts_with("Q5_") & (!ends_with("Levels") | !ends_with("Top2"))) %>% 

in two versions:

  1. Overall
  2. Grouped by Qtr

Is this what you are trying to accomplish?

library(tidyverse)

source <- data.frame(
    stringsAsFactors = FALSE,
    URN = c("21GB01293040","21GB01240221",
            "21GB03294610","21GB01309069","21GB03078286",
            "21GB01086060","21GB01169525","21GB01209144",
            "21GB01204925","21GB90063865",
            "21GB01068838","21GB01176411","21GB01215531"),
    Q5_1 = c(0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1),
    Q5_2 = c(0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0),
    Q5_Top2 = c(0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0),
    Q5_ThreeLevels = c(1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0),
    QA31 = c(7, 1, 8, 3, 8, 8, 8, 6, 2, 2, 7, 8, 8),
    Qtr = c(1, 1, 2, 2, 1, 2, 2, 1, 2, 2, 1, 2, 2)
)

source %>% 
    select(matches("^Q5_\\d")) %>% 
    summarise(across(.fns = list(Count = sum, Proportion = mean))) %>% 
    pivot_longer(everything(),
                 names_to = c("Category", "summary"),
                 names_sep = "(?<=_\\d)_",
                 values_to = "value") %>% 
    pivot_wider(names_from = summary,
                values_from = value)
#> # A tibble: 2 × 3
#>   Category Count Proportion
#>   <chr>    <dbl>      <dbl>
#> 1 Q5_1         5      0.385
#> 2 Q5_2         4      0.308

Created on 2022-03-29 by the reprex package (v2.0.1)

If so, the problem is the way you are using the names_sep argument

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.