Hi,
I would like to go back to dataframe that has been pivoted longer, I have tried with pivot_wider, no success yet, but still trying.
Any help would be much appreciated, thank you.
The names_pattern argument is useful when you are pivoting longer and the column names contain various pieces. You may want to use all information in the column names or just some of it. Using the data you posted, the following code ignores the "City_" part of the names and places the two remaining pieces in different columns. Since the parts of the names are defined with regular expression, complex cases can be handled.
dt <- readr::read_fwf("
Listen_Podcast Transport City_Bikini_Bottom City_Frostbite_Falls City_NewNew_York
Yes Drive 13 17 5
Yes Public 27 25 27
No Drive 23 22 17
No Public 44 31 22
")
Rows: 5 Columns: 5
── Column specification ──────────────────────────────────────────────────────
chr (5): X1, X2, X3, X4, X5
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
dt <- dt |>
janitor::row_to_names(row_number = 1)
dtLong <- dt |> pivot_longer(cols = starts_with("City"),
names_to = c("Part1", "Part2"),
names_pattern = "City_(.+)_(.+$)",
values_to = "Count")
dtLong
# A tibble: 12 × 5
Listen_Podcast Transport Part1 Part2 Count
<chr> <chr> <chr> <chr> <chr>
1 Yes Drive Bikini Bottom 13
2 Yes Drive Frostbite Falls 17
3 Yes Drive NewNew York 5
4 Yes Public Bikini Bottom 27
5 Yes Public Frostbite Falls 25
6 Yes Public NewNew York 27
7 No Drive Bikini Bottom 23
8 No Drive Frostbite Falls 22
9 No Drive NewNew York 17
10 No Public Bikini Bottom 44
11 No Public Frostbite Falls 31
12 No Public NewNew York 22
The parentheses in the regular expression determine what part of the names_pattern is stored in columns. In
names_pattern = "City_(.+)_(.+$)"
the text between the first underscore and the second underscore is matched by (.+) and is stored in the column named Part1. The text between the second underscore and the end of the string is matched by (.+$) and is stored in the column Part2.
If you were to wrap City in parentheses, it would also get a column, though you would have to add a third name to names_to so pivot_longer would know what name to put on each column.
Yes, the parentheses can be called capturing groups. They are typically used when you are replacing text and in the replacement text you want to refer to items like "the first text found". For example, here the first text is a series of numbers, (\\d+), and the second text is the characters after the numbers to the end of the string. I want to reverse the order of the two texts and put an underscore between them. The \\1 refers to the first text and \\2 refers to the second text.