I have to use a data frame to compute another dataframe with primary keys
Country , Gender and Year and with a column for each Age from 0 up to 110 .
I don't understand how to use the pivot_longer and pivot_wider functions...
Thank you
I have to use a data frame to compute another dataframe with primary keys
Country , Gender and Year and with a column for each Age from 0 up to 110 .
I don't understand how to use the pivot_longer and pivot_wider functions...
Thank you
Start by loading tidyr
.
library(tidyr)
This contains a couple of data frames that will be useful for learning, the first being tidyr::billboard
. Notice the structure of this data frame - there is information as to the artist, the track, the date, and then the billboard ranking for a load of weeks - each week in its own column.
> tidyr::billboard
# A tibble: 317 x 79
artist track date.entered wk1 wk2 wk3 wk4 wk5 wk6 wk7 wk8 wk9 wk10 wk11 wk12 wk13 wk14 wk15 wk16 wk17
<chr> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2 Pac Baby D~ 2000-02-26 87 82 72 77 87 94 99 NA NA NA NA NA NA NA NA NA NA
2 2Ge+her The Ha~ 2000-09-02 91 87 92 NA NA NA NA NA NA NA NA NA NA NA NA NA NA
3 3 Doors~ Krypto~ 2000-04-08 81 70 68 67 66 57 54 53 51 51 51 51 47 44 38 28 22
4 3 Doors~ Loser 2000-10-21 76 76 72 69 67 65 55 59 62 61 61 59 61 66 72 76 75
5 504 Boyz Wobble~ 2000-04-15 57 34 25 17 17 31 36 49 53 57 64 70 75 76 78 85 92
6 98^0 Give M~ 2000-08-19 51 39 34 26 26 19 2 2 3 6 7 22 29 36 47 67 66
7 A*Teens Dancin~ 2000-07-08 97 97 96 95 100 NA NA NA NA NA NA NA NA NA NA NA NA
8 Aaliyah I Don'~ 2000-01-29 84 62 51 41 38 35 35 38 38 36 37 37 38 49 61 63 62
9 Aaliyah Try Ag~ 2000-03-18 59 53 38 28 21 18 16 14 12 10 9 8 6 1 2 2 2
10 Adams, ~ Open M~ 2000-08-26 76 76 74 69 68 67 61 58 57 59 66 68 61 67 59 63 67
# ... with 307 more rows, and 59 more variables: wk18 <dbl>, wk19 <dbl>, wk20 <dbl>, wk21 <dbl>, wk22 <dbl>, wk23 <dbl>, wk24 <dbl>,
# wk25 <dbl>, wk26 <dbl>, wk27 <dbl>, wk28 <dbl>, wk29 <dbl>, wk30 <dbl>, wk31 <dbl>, wk32 <dbl>, wk33 <dbl>, wk34 <dbl>,
# wk35 <dbl>, wk36 <dbl>, wk37 <dbl>, wk38 <dbl>, wk39 <dbl>, wk40 <dbl>, wk41 <dbl>, wk42 <dbl>, wk43 <dbl>, wk44 <dbl>,
# wk45 <dbl>, wk46 <dbl>, wk47 <dbl>, wk48 <dbl>, wk49 <dbl>, wk50 <dbl>, wk51 <dbl>, wk52 <dbl>, wk53 <dbl>, wk54 <dbl>,
# wk55 <dbl>, wk56 <dbl>, wk57 <dbl>, wk58 <dbl>, wk59 <dbl>, wk60 <dbl>, wk61 <dbl>, wk62 <dbl>, wk63 <dbl>, wk64 <dbl>,
# wk65 <dbl>, wk66 <lgl>, wk67 <lgl>, wk68 <lgl>, wk69 <lgl>, wk70 <lgl>, wk71 <lgl>, wk72 <lgl>, wk73 <lgl>, wk74 <lgl>,
# wk75 <lgl>, wk76 <lgl>
This is sub-optimal to work in a "tidy" way. Note what happens when we run pivot_longer
on this data and specify the columns that (in this case) we don't want pivoting. All of the "week" columns have effectively been stacked on top of each other, with their contents placed in a new column.
tidyr::pivot_longer(tidyr::billboard, -c(artist, track, date.entered),
names_to = "week", values_to = "position")
#> # A tibble: 24,092 x 5
#> artist track date.entered week position
#> <chr> <chr> <date> <chr> <dbl>
#> 1 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk1 87
#> 2 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk2 82
#> 3 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk3 72
#> 4 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk4 77
#> 5 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk5 87
#> 6 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk6 94
#> 7 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk7 99
#> 8 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk8 NA
#> 9 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk9 NA
#> 10 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk10 NA
#> # ... with 24,082 more rows
Now let's consider another data frame from tidyr
called "fish_encounters". This shows if a particular fish was seen at a particular monitoring station.
tidyr::fish_encounters
#> # A tibble: 114 x 3
#> fish station seen
#> <fct> <fct> <int>
#> 1 4842 Release 1
#> 2 4842 I80_1 1
#> 3 4842 Lisbon 1
#> 4 4842 Rstr 1
#> 5 4842 Base_TD 1
#> 6 4842 BCE 1
#> 7 4842 BCW 1
#> 8 4842 BCE2 1
#> 9 4842 BCW2 1
#> 10 4842 MAE 1
#> # ... with 104 more rows
#
Note what happens if we use pivot_wider
, specifying where the column headers come from and where the values come from:
tidyr::fish_encounters |>
tidyr::pivot_wider(names_from = station, values_from = seen)
#> # A tibble: 19 x 12
#> fish Release I80_1 Lisbon Rstr Base_TD BCE BCW BCE2 BCW2 MAE MAW
#> <fct> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 4842 1 1 1 1 1 1 1 1 1 1 1
#> 2 4843 1 1 1 1 1 1 1 1 1 1 1
#> 3 4844 1 1 1 1 1 1 1 1 1 1 1
#> 4 4845 1 1 1 1 1 NA NA NA NA NA NA
#> 5 4847 1 1 1 NA NA NA NA NA NA NA NA
#> 6 4848 1 1 1 1 NA NA NA NA NA NA NA
#> 7 4849 1 1 NA NA NA NA NA NA NA NA NA
#> 8 4850 1 1 NA 1 1 1 1 NA NA NA NA
#> 9 4851 1 1 NA NA NA NA NA NA NA NA NA
#> 10 4854 1 1 NA NA NA NA NA NA NA NA NA
#> 11 4855 1 1 1 1 1 NA NA NA NA NA NA
#> 12 4857 1 1 1 1 1 1 1 1 1 NA NA
#> 13 4858 1 1 1 1 1 1 1 1 1 1 1
#> 14 4859 1 1 1 1 1 NA NA NA NA NA NA
#> 15 4861 1 1 1 1 1 1 1 1 1 1 1
#> 16 4862 1 1 1 1 1 1 1 1 1 NA NA
#> 17 4863 1 1 NA NA NA NA NA NA NA NA NA
#> 18 4864 1 1 NA NA NA NA NA NA NA NA NA
#> 19 4865 1 1 1 NA NA NA NA NA NA NA NA
The "pivoting" functions in tidyr
are all about reshaping data. This is a conceptually difficult thing to do, so don't be disheartened it isn't immediately obvious. I think I would recommend reading the documentation for the tidyr
functions available here:
If you would like someone to demonstrate using your own data, you may want to provide a reproducible example.
Thank you so much for your explanations !
But I still have a problem... I don't get the resultat that I want...
Here is my code:
```{r}
library(tidyr)
life_table_pivot <- life_table %>%
pivot_longer(cols = -c("Country", "Gender", "Year"),names_to = "Column" , values_to = "Age")
life_table_pivot
And I didn't get what I wanted...
I'm supposed to have something like this :
It's difficult to know how to solve your issues without access to your data I'm afraid.
FAQ: What's a reproducible example (reprex
) and how do I create one?
An easy way of providing your data is by providing the output of dput(life_table)
if the data isn't too big.
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.