the function pivot_wider()

hello_world1 · December 4, 2021, 9:00pm

I have to use a data frame to compute another dataframe with primary keys
Country , Gender and Year and with a column for each Age from 0 up to 110 .

I don't understand how to use the pivot_longer and pivot_wider functions...

Thank you

JackDavison · December 4, 2021, 9:12pm

Start by loading tidyr.

library(tidyr)

This contains a couple of data frames that will be useful for learning, the first being tidyr::billboard. Notice the structure of this data frame - there is information as to the artist, the track, the date, and then the billboard ranking for a load of weeks - each week in its own column.

> tidyr::billboard
# A tibble: 317 x 79
   artist   track   date.entered   wk1   wk2   wk3   wk4   wk5   wk6   wk7   wk8   wk9  wk10  wk11  wk12  wk13  wk14  wk15  wk16  wk17
   <chr>    <chr>   <date>       <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1 2 Pac    Baby D~ 2000-02-26      87    82    72    77    87    94    99    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
 2 2Ge+her  The Ha~ 2000-09-02      91    87    92    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
 3 3 Doors~ Krypto~ 2000-04-08      81    70    68    67    66    57    54    53    51    51    51    51    47    44    38    28    22
 4 3 Doors~ Loser   2000-10-21      76    76    72    69    67    65    55    59    62    61    61    59    61    66    72    76    75
 5 504 Boyz Wobble~ 2000-04-15      57    34    25    17    17    31    36    49    53    57    64    70    75    76    78    85    92
 6 98^0     Give M~ 2000-08-19      51    39    34    26    26    19     2     2     3     6     7    22    29    36    47    67    66
 7 A*Teens  Dancin~ 2000-07-08      97    97    96    95   100    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
 8 Aaliyah  I Don'~ 2000-01-29      84    62    51    41    38    35    35    38    38    36    37    37    38    49    61    63    62
 9 Aaliyah  Try Ag~ 2000-03-18      59    53    38    28    21    18    16    14    12    10     9     8     6     1     2     2     2
10 Adams, ~ Open M~ 2000-08-26      76    76    74    69    68    67    61    58    57    59    66    68    61    67    59    63    67
# ... with 307 more rows, and 59 more variables: wk18 <dbl>, wk19 <dbl>, wk20 <dbl>, wk21 <dbl>, wk22 <dbl>, wk23 <dbl>, wk24 <dbl>,
#   wk25 <dbl>, wk26 <dbl>, wk27 <dbl>, wk28 <dbl>, wk29 <dbl>, wk30 <dbl>, wk31 <dbl>, wk32 <dbl>, wk33 <dbl>, wk34 <dbl>,
#   wk35 <dbl>, wk36 <dbl>, wk37 <dbl>, wk38 <dbl>, wk39 <dbl>, wk40 <dbl>, wk41 <dbl>, wk42 <dbl>, wk43 <dbl>, wk44 <dbl>,
#   wk45 <dbl>, wk46 <dbl>, wk47 <dbl>, wk48 <dbl>, wk49 <dbl>, wk50 <dbl>, wk51 <dbl>, wk52 <dbl>, wk53 <dbl>, wk54 <dbl>,
#   wk55 <dbl>, wk56 <dbl>, wk57 <dbl>, wk58 <dbl>, wk59 <dbl>, wk60 <dbl>, wk61 <dbl>, wk62 <dbl>, wk63 <dbl>, wk64 <dbl>,
#   wk65 <dbl>, wk66 <lgl>, wk67 <lgl>, wk68 <lgl>, wk69 <lgl>, wk70 <lgl>, wk71 <lgl>, wk72 <lgl>, wk73 <lgl>, wk74 <lgl>,
#   wk75 <lgl>, wk76 <lgl>

This is sub-optimal to work in a "tidy" way. Note what happens when we run pivot_longer on this data and specify the columns that (in this case) we don't want pivoting. All of the "week" columns have effectively been stacked on top of each other, with their contents placed in a new column.

tidyr::pivot_longer(tidyr::billboard, -c(artist, track, date.entered),
                    names_to = "week", values_to = "position")
#> # A tibble: 24,092 x 5
#>    artist track                   date.entered week  position
#>    <chr>  <chr>                   <date>       <chr>    <dbl>
#>  1 2 Pac  Baby Don't Cry (Keep... 2000-02-26   wk1         87
#>  2 2 Pac  Baby Don't Cry (Keep... 2000-02-26   wk2         82
#>  3 2 Pac  Baby Don't Cry (Keep... 2000-02-26   wk3         72
#>  4 2 Pac  Baby Don't Cry (Keep... 2000-02-26   wk4         77
#>  5 2 Pac  Baby Don't Cry (Keep... 2000-02-26   wk5         87
#>  6 2 Pac  Baby Don't Cry (Keep... 2000-02-26   wk6         94
#>  7 2 Pac  Baby Don't Cry (Keep... 2000-02-26   wk7         99
#>  8 2 Pac  Baby Don't Cry (Keep... 2000-02-26   wk8         NA
#>  9 2 Pac  Baby Don't Cry (Keep... 2000-02-26   wk9         NA
#> 10 2 Pac  Baby Don't Cry (Keep... 2000-02-26   wk10        NA
#> # ... with 24,082 more rows

Now let's consider another data frame from tidyr called "fish_encounters". This shows if a particular fish was seen at a particular monitoring station.

tidyr::fish_encounters
#> # A tibble: 114 x 3
#>    fish  station  seen
#>    <fct> <fct>   <int>
#>  1 4842  Release     1
#>  2 4842  I80_1       1
#>  3 4842  Lisbon      1
#>  4 4842  Rstr        1
#>  5 4842  Base_TD     1
#>  6 4842  BCE         1
#>  7 4842  BCW         1
#>  8 4842  BCE2        1
#>  9 4842  BCW2        1
#> 10 4842  MAE         1
#> # ... with 104 more rows
#

Note what happens if we use pivot_wider, specifying where the column headers come from and where the values come from:

tidyr::fish_encounters |> 
  tidyr::pivot_wider(names_from = station, values_from = seen)
#> # A tibble: 19 x 12
#>    fish  Release I80_1 Lisbon  Rstr Base_TD   BCE   BCW  BCE2  BCW2   MAE   MAW
#>    <fct>   <int> <int>  <int> <int>   <int> <int> <int> <int> <int> <int> <int>
#>  1 4842        1     1      1     1       1     1     1     1     1     1     1
#>  2 4843        1     1      1     1       1     1     1     1     1     1     1
#>  3 4844        1     1      1     1       1     1     1     1     1     1     1
#>  4 4845        1     1      1     1       1    NA    NA    NA    NA    NA    NA
#>  5 4847        1     1      1    NA      NA    NA    NA    NA    NA    NA    NA
#>  6 4848        1     1      1     1      NA    NA    NA    NA    NA    NA    NA
#>  7 4849        1     1     NA    NA      NA    NA    NA    NA    NA    NA    NA
#>  8 4850        1     1     NA     1       1     1     1    NA    NA    NA    NA
#>  9 4851        1     1     NA    NA      NA    NA    NA    NA    NA    NA    NA
#> 10 4854        1     1     NA    NA      NA    NA    NA    NA    NA    NA    NA
#> 11 4855        1     1      1     1       1    NA    NA    NA    NA    NA    NA
#> 12 4857        1     1      1     1       1     1     1     1     1    NA    NA
#> 13 4858        1     1      1     1       1     1     1     1     1     1     1
#> 14 4859        1     1      1     1       1    NA    NA    NA    NA    NA    NA
#> 15 4861        1     1      1     1       1     1     1     1     1     1     1
#> 16 4862        1     1      1     1       1     1     1     1     1    NA    NA
#> 17 4863        1     1     NA    NA      NA    NA    NA    NA    NA    NA    NA
#> 18 4864        1     1     NA    NA      NA    NA    NA    NA    NA    NA    NA
#> 19 4865        1     1      1    NA      NA    NA    NA    NA    NA    NA    NA

The "pivoting" functions in tidyr are all about reshaping data. This is a conceptually difficult thing to do, so don't be disheartened it isn't immediately obvious. I think I would recommend reading the documentation for the tidyr functions available here:

If you would like someone to demonstrate using your own data, you may want to provide a reproducible example.

hello_world1 · December 4, 2021, 9:38pm

Thank you so much for your explanations !

But I still have a problem... I don't get the resultat that I want...

Here is my code:

```{r}
library(tidyr)

life_table_pivot <- life_table %>%
  pivot_longer(cols = -c("Country", "Gender", "Year"),names_to = "Column" , values_to = "Age")
life_table_pivot

And I didn't get what I wanted...

I'm supposed to have something like this :

JackDavison · December 4, 2021, 9:41pm

It's difficult to know how to solve your issues without access to your data I'm afraid.

FAQ: What's a reproducible example (reprex) and how do I create one?

An easy way of providing your data is by providing the output of dput(life_table) if the data isn't too big.

system · December 25, 2021, 9:41pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.