Tidy way to switch rows and columns in tibble

startz · April 11, 2024, 5:33pm

Is there a tidy way to swap columns and rows in a tibble?

For example if I start with

I'd like to end up with

Currently I am doing this with t() (transpose), but this seems terribly inelegant.

unrelated · April 11, 2024, 7:30pm

Would be great if you could provide a reproducible example, with code I and other users can copy, instead of a screenshot.

In the meantime, you may want to explore tidyr::pivot_longer(). It's not quite transposing, but it takes column names and turns them into vectors. If there are no columns that you want to keep between the old and new, though, simply transposing should work fine.

startz · April 11, 2024, 7:52pm

pivot_longer doesn't do the job, at least not in a way I can figure

"262416"), ...4 = c("2008-09", "260327"), ...5 = c("2009-10", 
"270604"), ...6 = c("2010-11", "273996"), ...7 = c("2011-12", 
"283332"), ...8 = c("2012-13", "289408"), ...9 = c("2013-14", 
"304467"), ...10 = c("2014-15", "313415"), ...11 = c("2015-16", 
"325339"), ...12 = c("2016-17", "332727"), ...13 = c("2017-18", 
"341751"), ...14 = c("2018-19", "347099"), ...15 = c("2019-20\\1\\", 
"162633"), ...16 = c("2020-21\\2\\", "14549")), row.names = c(NA, 
-2L), class = c("tbl_df", "tbl", "data.frame"))

Fixed up version is

cleaned <- as_tibble(t(cleaned), .name_repair = "minimal")
colnames(cleaned) <- c("Year", "Students")

cleaned <- structure(list(Year = c("2000–01", "2007–08", "2008-09", 
"2009-10", "2010-11", "2011-12", "2012-13", "2013-14", "2014-15", 
"2015-16", "2016-17", "2017-18", "2018-19", "2019-20\\1\\", "2020-21\\2\\"
), Students = c("154168", "262416", "260327", "270604", "273996", 
"283332", "289408", "304467", "313415", "325339", "332727", "341751", 
"347099", "162633", "14549")), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -15L))

dromano · April 11, 2024, 8:26pm

Here's one approach, and I'll add comments later.

matrix(
  data = 2001:2006 |> c(1:6),
  ncol = 6,
  byrow = T
) |> 
  as.data.frame() -> wide

wide
#>     V1   V2   V3   V4   V5   V6
#> 1 2001 2002 2003 2004 2005 2006
#> 2    1    2    3    4    5    6

library(tidyverse)

wide |> 
  mutate(type = c('date', 'student')) |> 
  pivot_longer(V1:V6) |> 
  pivot_wider(names_from = type, values_from = value) |> 
  select(!name)
#> # A tibble: 6 × 2
#>    date student
#>   <int>   <int>
#> 1  2001       1
#> 2  2002       2
#> 3  2003       3
#> 4  2004       4
#> 5  2005       5
#> 6  2006       6

startz · April 11, 2024, 8:32pm

This looks incredibly clever. I will forward to the comments and learning more.

dromano · April 11, 2024, 10:16pm

Here's one approach, starting from a table similar to the one from your post. (Full reprex is as bottom of post.)

matrix(
  data = 2001:2006 |> c(1:6),
  ncol = 6,
  byrow = T
) |> 
  as.data.frame() -> wide

wide
#>     V1   V2   V3   V4   V5   V6
#> 1 2001 2002 2003 2004 2005 2006
#> 2    1    2    3    4    5    6

library(tidyverse)

wide |> 
  mutate(type = c('date', 'student')) |> 
  pivot_longer(V1:V6) |> 
  pivot_wider(names_from = type, values_from = value) |> 
  select(!name)
#> # A tibble: 6 × 2
#>    date student
#>   <int>   <int>
#> 1  2001       1
#> 2  2002       2
#> 3  2003       3
#> 4  2004       4
#> 5  2005       5
#> 6  2006       6

And here is what is going on: If you try the pivot_longer() function by itself, it doesn't allow you to distinguish between dates and students,

wide |> 
  pivot_longer(V1:V6)
#> # A tibble: 12 × 2
#>    name  value
#>    <chr> <int>
#>  1 V1     2001
#>  2 V2     2002
#>  3 V3     2003
#>  4 V4     2004
#>  5 V5     2005
#>  6 V6     2006
#>  7 V1        1
#>  8 V2        2
#>  9 V3        3
#> 10 V4        4
#> 11 V5        5
#> 12 V6        6

but if you first add new column that contains the future columns names date and student as values,

wide |> 
  mutate(type = c('date', 'student')) 
#>     V1   V2   V3   V4   V5   V6    type
#> 1 2001 2002 2003 2004 2005 2006    date
#> 2    1    2    3    4    5    6 student

pivot_longer()can turn them into a kind of label to distinguish between date and student values:

wide |> 
  mutate(type = c('date', 'student')) |> 
  pivot_longer(V1:V6)
#> # A tibble: 12 × 3
#>    type    name  value
#>    <chr>   <chr> <int>
#>  1 date    V1     2001
#>  2 date    V2     2002
#>  3 date    V3     2003
#>  4 date    V4     2004
#>  5 date    V5     2005
#>  6 date    V6     2006
#>  7 student V1        1
#>  8 student V2        2
#>  9 student V3        3
#> 10 student V4        4
#> 11 student V5        5
#> 12 student V6        6

From here, you can know use pivot_wider to turn the labels into column names,

wide |> 
  mutate(type = c('date', 'student')) |> 
  pivot_longer(V1:V6) |> 
  pivot_wider(names_from = type, values_from = value)
#> # A tibble: 6 × 3
#>   name   date student
#>   <chr> <int>   <int>
#> 1 V1     2001       1
#> 2 V2     2002       2
#> 3 V3     2003       3
#> 4 V4     2004       4
#> 5 V5     2005       5
#> 6 V6     2006       6

and then remove the column name, which is no longer needed:

wide |> 
  mutate(type = c('date', 'student')) |> 
  pivot_longer(V1:V6) |> 
  pivot_wider(names_from = type, values_from = value) |> 
  select(!name)
#> # A tibble: 6 × 2
#>    date student
#>   <int>   <int>
#> 1  2001       1
#> 2  2002       2
#> 3  2003       3
#> 4  2004       4
#> 5  2005       5
#> 6  2006       6

^{Created on 2024-04-11 with reprex v2.0.2}

full reprex

matrix(
  data = 2001:2006 |> c(1:6),
  ncol = 6,
  byrow = T
) |> 
  as.data.frame() -> wide

wide
#>     V1   V2   V3   V4   V5   V6
#> 1 2001 2002 2003 2004 2005 2006
#> 2    1    2    3    4    5    6

library(tidyverse)

wide |> 
  mutate(type = c('date', 'student')) |> 
  pivot_longer(V1:V6) |> 
  pivot_wider(names_from = type, values_from = value) |> 
  select(!name)
#> # A tibble: 6 × 2
#>    date student
#>   <int>   <int>
#> 1  2001       1
#> 2  2002       2
#> 3  2003       3
#> 4  2004       4
#> 5  2005       5
#> 6  2006       6

# pivot longer by itself doesn't allow you to distinguish
# between dates and students
wide |> 
  pivot_longer(V1:V6)
#> # A tibble: 12 × 2
#>    name  value
#>    <chr> <int>
#>  1 V1     2001
#>  2 V2     2002
#>  3 V3     2003
#>  4 V4     2004
#>  5 V5     2005
#>  6 V6     2006
#>  7 V1        1
#>  8 V2        2
#>  9 V3        3
#> 10 V4        4
#> 11 V5        5
#> 12 V6        6

# but if you add new column that contains the future 
# columns names 'date' and 'student' as values:
wide |> 
  mutate(type = c('date', 'student')) 
#>     V1   V2   V3   V4   V5   V6    type
#> 1 2001 2002 2003 2004 2005 2006    date
#> 2    1    2    3    4    5    6 student

#  pivot_longer can turn them into a kind of label to 
# distinguish between date and student values
wide |> 
  mutate(type = c('date', 'student')) |> 
  pivot_longer(V1:V6)
#> # A tibble: 12 × 3
#>    type    name  value
#>    <chr>   <chr> <int>
#>  1 date    V1     2001
#>  2 date    V2     2002
#>  3 date    V3     2003
#>  4 date    V4     2004
#>  5 date    V5     2005
#>  6 date    V6     2006
#>  7 student V1        1
#>  8 student V2        2
#>  9 student V3        3
#> 10 student V4        4
#> 11 student V5        5
#> 12 student V6        6

# from here, you can know use pivot_wider to turn
# the labels into column names
wide |> 
  mutate(type = c('date', 'student')) |> 
  pivot_longer(V1:V6) |> 
  pivot_wider(names_from = type, values_from = value)
#> # A tibble: 6 × 3
#>   name   date student
#>   <chr> <int>   <int>
#> 1 V1     2001       1
#> 2 V2     2002       2
#> 3 V3     2003       3
#> 4 V4     2004       4
#> 5 V5     2005       5
#> 6 V6     2006       6

# and then remove the column 'name', which is no longer needed
wide |> 
  mutate(type = c('date', 'student')) |> 
  pivot_longer(V1:V6) |> 
  pivot_wider(names_from = type, values_from = value) |> 
  select(!name)
#> # A tibble: 6 × 2
#>    date student
#>   <int>   <int>
#> 1  2001       1
#> 2  2002       2
#> 3  2003       3
#> 4  2004       4
#> 5  2005       5
#> 6  2006       6

^{Created on 2024-04-11 with reprex v2.0.2}

system · April 18, 2024, 10:16pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.