Converting xtab format -> three (or two) columns?

Given the following data:

df = tibble::tribble(
  ~term,          ~x1, ~x2,          ~x3,          ~x4, ~x5,          ~x6,       ~x7, ~x8, ~x9, ~x10, ~x11,
  "x1",                   NA,       NA,                 NA,                 NA,               NA,                 NA,                 NA,         NA,           NA,       NA,        NA,
  "x2",    1,       NA,                 NA,                 NA,               NA,                 NA,                 NA,         NA,           NA,       NA,        NA,
  "x3",   2,        0,                 NA,                 NA,               NA,                 NA,                 NA,         NA,           NA,       NA,        NA,
  "x4",   1,        0,                  0,                 NA,               NA,                 NA,                 NA,         NA,           NA,       NA,        NA,
  "x5",    1,        0,                  0,                  0,               NA,                 NA,                 NA,         NA,           NA,       NA,        NA,
  "x6",  0,        0, -0.333333333333333,                  0,                0,                 NA,                 NA,         NA,           NA,       NA,        NA,
  "x7",  0,        0,                  0, -0.333333333333333,                0,                  0,                 NA,         NA,           NA,       NA,        NA,
  "x8", 0,        0,                  0,                  0,                0,                  0,                  0,         NA,           NA,       NA,        NA,
  "x9",   0,        0,                  0, -0.333333333333333,                0,                  0, -0.333333333333333,          0,           NA,       NA,        NA,
  "x10",  0,        0,                  0,                  0,                0,                  0,                  0,          0,            0,       NA,        NA,
  "x11",    0,        0, -0.333333333333333,                  0,                0, -0.333333333333333,                  0,          0,            0,        0,        NA
)

I would like to convert this to a three column format.
Essentially, I would like to convert to something like the following:

 v1  v2  value
 x1  x2      1
 x1  x3      2
 x1  x4      1
...

where v1 would be the column, v2 would be the row.

How would I convert this?

This looks like a standard pivot to a long format. I made the assumption you want to drop the rows with an NA value. Most of the steps below only serve to adjust the display of the output to match what you posted.

df = tibble::tribble(
  ~term,          ~x1, ~x2,          ~x3,          ~x4, ~x5,          ~x6,       ~x7, ~x8, ~x9, ~x10, ~x11,
  "x1",                   NA,       NA,                 NA,                 NA,               NA,                 NA,                 NA,         NA,           NA,       NA,        NA,
  "x2",    1,       NA,                 NA,                 NA,               NA,                 NA,                 NA,         NA,           NA,       NA,        NA,
  "x3",   2,        0,                 NA,                 NA,               NA,                 NA,                 NA,         NA,           NA,       NA,        NA,
  "x4",   1,        0,                  0,                 NA,               NA,                 NA,                 NA,         NA,           NA,       NA,        NA,
  "x5",    1,        0,                  0,                  0,               NA,                 NA,                 NA,         NA,           NA,       NA,        NA,
  "x6",  0,        0, -0.333333333333333,                  0,                0,                 NA,                 NA,         NA,           NA,       NA,        NA,
  "x7",  0,        0,                  0, -0.333333333333333,                0,                  0,                 NA,         NA,           NA,       NA,        NA,
  "x8", 0,        0,                  0,                  0,                0,                  0,                  0,         NA,           NA,       NA,        NA,
  "x9",   0,        0,                  0, -0.333333333333333,                0,                  0, -0.333333333333333,          0,           NA,       NA,        NA,
  "x10",  0,        0,                  0,                  0,                0,                  0,                  0,          0,            0,       NA,        NA,
  "x11",    0,        0, -0.333333333333333,                  0,                0, -0.333333333333333,                  0,          0,            0,        0,        NA
)
df
#> # A tibble: 11 × 12
#>    term     x1    x2     x3     x4    x5     x6     x7    x8    x9   x10
#>    <chr> <dbl> <dbl>  <dbl>  <dbl> <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl>
#>  1 x1       NA    NA NA     NA        NA NA     NA        NA    NA    NA
#>  2 x2        1    NA NA     NA        NA NA     NA        NA    NA    NA
#>  3 x3        2     0 NA     NA        NA NA     NA        NA    NA    NA
#>  4 x4        1     0  0     NA        NA NA     NA        NA    NA    NA
#>  5 x5        1     0  0      0        NA NA     NA        NA    NA    NA
#>  6 x6        0     0 -0.333  0         0 NA     NA        NA    NA    NA
#>  7 x7        0     0  0     -0.333     0  0     NA        NA    NA    NA
#>  8 x8        0     0  0      0         0  0      0        NA    NA    NA
#>  9 x9        0     0  0     -0.333     0  0     -0.333     0    NA    NA
#> 10 x10       0     0  0      0         0  0      0         0     0    NA
#> 11 x11       0     0 -0.333  0         0 -0.333  0         0     0     0
#> # … with 1 more variable: x11 <lgl>
library(tidyr)
library(dplyr)

NewDF <- pivot_longer(df, -term, names_to = "v1") %>% 
  filter(!is.na(value)) %>% #steps after here only adjust the appearance 
  rename(v2 = "term") %>% 
  select(v1, v2, value) %>% #reorder the columns
  mutate(across(.cols = c("v1", "v2"), .fns = ~factor(.x, levels = paste0("x", 1:11)))) %>% #adjust how v1 and v2 will sort
  arrange(v1, v2)
NewDF
#> # A tibble: 55 × 3
#>    v1    v2    value
#>    <fct> <fct> <dbl>
#>  1 x1    x2        1
#>  2 x1    x3        2
#>  3 x1    x4        1
#>  4 x1    x5        1
#>  5 x1    x6        0
#>  6 x1    x7        0
#>  7 x1    x8        0
#>  8 x1    x9        0
#>  9 x1    x10       0
#> 10 x1    x11       0
#> # … with 45 more rows

Created on 2022-04-29 by the reprex package (v0.2.1)

1 Like

Thanks

I should have realised pivot_longer() was the answer here.

One quick question though -- I didn't understand this line here (and it seems to give all NA values for v1 and v2). Can you explain what this line is supposed to do?

mutate(across(.cols = c("v1", "v2"), .fns = ~factor(.x, levels = paste0("x", 1:11)))) %>% #adjust how v1 and v2 will sort

If I run the code with that line commented out, the columns v1 and v2 sort alphabetically. Notice that x10 and x11 come before x2.

 NewDF <- pivot_longer(df, -term, names_to = "v1") %>% 
+   filter(!is.na(value)) %>% #steps after here only adjust the appearance 
+   rename(v2 = "term") %>% 
+   select(v1, v2, value) %>% #reorder the columns
+   #mutate(across(.cols = c("v1", "v2"), .fns = ~factor(.x, levels = paste0("x", 1:11)))) %>% #adjust how v1 and v2 will sort
+   arrange(v1, v2)
> NewDF
# A tibble: 55 x 3
   v1    v2    value
   <chr> <chr> <dbl>
 1 x1    x10       0
 2 x1    x11       0
 3 x1    x2        1
 4 x1    x3        2
 5 x1    x4        1
 6 x1    x5        1
 7 x1    x6        0
 8 x1    x7        0
 9 x1    x8        0
10 x1    x9        0
# ... with 45 more rows

The factor function changes the two character vectors v1 and v2 into factors. Its levels argument sets what levels the factor is allowed to have. It also sets the order of those levels. I used paste0() to save myself some typing and I set the order to follow numeric value of the part after the leading x, rather than following the alphabetical order that is the default

> paste0("x", 1:11)
 [1] "x1"  "x2"  "x3"  "x4"  "x5"  "x6"  "x7"  "x8"  "x9"  "x10" "x11"
1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.