How to restore a draft frame that has been pivoted longer

Andrzej · January 22, 2023, 6:49pm

Hi,
I would like to go back to dataframe that has been pivoted longer, I have tried with pivot_wider, no success yet, but still trying.
Any help would be much appreciated, thank you.

dt <- readr::read_fwf("
Listen_Podcast     Transport               City_Bikini_Bottom  City_Frostbite_Falls  City_NewNew_York
Yes                Drive                       13                     17                     5
Yes                Public                      27                     25                    27
No                 Drive                       23                     22                    17
No                 Public                      44                     31                    22
")

dt %<>%
  janitor::row_to_names(row_number = 1)

dt %>%
  pivot_longer(cols = starts_with("City"), names_to = "City", names_prefix = "City_", values_to = "Count")
 %>% relocate(City) %>% arrange(City)

FJCC · January 22, 2023, 7:05pm

library(tidyr)
library(dplyr)

dt <- readr::read_fwf("
Listen_Podcast     Transport               City_Bikini_Bottom  City_Frostbite_Falls  City_NewNew_York
Yes                Drive                       13                     17                     5
Yes                Public                      27                     25                    27
No                 Drive                       23                     22                    17
No                 Public                      44                     31                    22
")
#> Rows: 5 Columns: 5
#> ── Column specification ────────────────────────────────────────────────────────
#> 
#> chr (5): X1, X2, X3, X4, X5
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

dt <- dt |> 
  janitor::row_to_names(row_number = 1)

dtLong <- dt %>%
  pivot_longer(cols = starts_with("City"), names_to = "City", names_prefix = "City_", 
               values_to = "Count") |>  relocate(City) |>  arrange(City)

dtWide <- dtLong |> pivot_wider(names_from = City, values_from = Count, 
                                names_prefix = "City_")

identical(dt, dtWide)
#> [1] TRUE

^{Created on 2023-01-22 with reprex v2.0.2}

Andrzej · January 22, 2023, 7:23pm

Hi, thank you, this is what I wanted. Can you please explain when we should use names_pattern option ? When is it useful to do ?

FJCC · January 22, 2023, 7:44pm

The names_pattern argument is useful when you are pivoting longer and the column names contain various pieces. You may want to use all information in the column names or just some of it. Using the data you posted, the following code ignores the "City_" part of the names and places the two remaining pieces in different columns. Since the parts of the names are defined with regular expression, complex cases can be handled.

dt <- readr::read_fwf("
 Listen_Podcast     Transport               City_Bikini_Bottom  City_Frostbite_Falls  City_NewNew_York
 Yes                Drive                       13                     17                     5
 Yes                Public                      27                     25                    27
 No                 Drive                       23                     22                    17
 No                 Public                      44                     31                    22
 ")
Rows: 5 Columns: 5
── Column specification ──────────────────────────────────────────────────────

chr (5): X1, X2, X3, X4, X5

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

dt <- dt |> 
   janitor::row_to_names(row_number = 1)

dtLong <- dt |> pivot_longer(cols = starts_with("City"),
                              names_to = c("Part1", "Part2"),
                              names_pattern = "City_(.+)_(.+$)",
                              values_to = "Count")
dtLong
# A tibble: 12 × 5
   Listen_Podcast Transport Part1     Part2  Count
   <chr>          <chr>     <chr>     <chr>  <chr>
 1 Yes            Drive     Bikini    Bottom 13   
 2 Yes            Drive     Frostbite Falls  17   
 3 Yes            Drive     NewNew    York   5    
 4 Yes            Public    Bikini    Bottom 27   
 5 Yes            Public    Frostbite Falls  25   
 6 Yes            Public    NewNew    York   27   
 7 No             Drive     Bikini    Bottom 23   
 8 No             Drive     Frostbite Falls  22   
 9 No             Drive     NewNew    York   17   
10 No             Public    Bikini    Bottom 44   
11 No             Public    Frostbite Falls  31   
12 No             Public    NewNew    York   22

Andrzej · January 23, 2023, 6:30am

How does it do it that it places those parts in two different columns ? I am asking because there is no separator defined, so how does

it know that it should be placed in two different columns ? I would be grateful for your reply as somehow I don't see it.

FJCC · January 23, 2023, 7:05am

The parentheses in the regular expression determine what part of the names_pattern is stored in columns. In

names_pattern = "City_(.+)_(.+$)"

the text between the first underscore and the second underscore is matched by (.+) and is stored in the column named Part1. The text between the second underscore and the end of the string is matched by (.+$) and is stored in the column Part2.
If you were to wrap City in parentheses, it would also get a column, though you would have to add a third name to names_to so pivot_longer would know what name to put on each column.

Andrzej · January 23, 2023, 7:47am

Thank you, so does it have to do anything with something called "capturing groups" in Regex ?

FJCC · January 23, 2023, 4:10pm

Yes, the parentheses can be called capturing groups. They are typically used when you are replacing text and in the replacement text you want to refer to items like "the first text found". For example, here the first text is a series of numbers, (\\d+), and the second text is the characters after the numbers to the end of the string. I want to reverse the order of the two texts and put an underscore between them. The \\1 refers to the first text and \\2 refers to the second text.

STR <- c("ABC123DEF", "WER2TYUIN", "BHUUI1287456Z")
sub(pattern = "(\\d+)(.+$)", replacement = "\\2_\\1", x = STR)
[1] "ABCDEF_123"     "WERTYUIN_2"     "BHUUIZ_1287456"

The use in names_pattern is very similar but the captured groups are mapped to the column names set in names_to

Andrzej · January 23, 2023, 8:29pm

Thank you very much, that was really helpful.

I realised that now, as in normal Regex this is denoted as $1 and $2.
Thank you once again.

system · January 30, 2023, 8:30pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.