How do I preserve names when I use pivot_wider on a named list column?

Hi all,

I have data frame with 1 named list column that I need to widen. However, when I widen, the names of each element of the list column becomes an empty string. I need to keep these names for downstream scripts that depend on them.

Hack solution 1:

  • All elements on the same row are named the same as the tissue value for that row.
  • Figure out a way of applying that value to each name
  • Something like:
test %>% rowwise %>% mutate(across(!tissue), .fns = names(.x) <- tissue)

Hack Solution 2 (works but not elegant):

names(test$res_Low_v_High_in_T1) <- test$tissue
names(tests$res_Low_v_High_in_T2) <- test$tissue
names(test$res_T2_v_T1_in_High) <- test$tissue
names(test$res_T2_v_T1_in_Low) <- test$tissue
names(test$res_trial_interaction) <- test$tissue

Does anyone know how to preserve the names (ideally) or to apply hack solution 1?

Thanks all in advance.
Kenneth

Original data frame in long format:

> deseq2_results
# A tibble: 25 × 3
   tissue   contrast              results               
   <chr>    <chr>                 <named list>          
 1 Duodenum res_Low_v_High_in_T1  <tibble [15,714 × 15]>
 2 Duodenum res_Low_v_High_in_T2  <tibble [15,714 × 15]>
 3 Duodenum res_T2_v_T1_in_High   <tibble [15,714 × 15]>
 4 Duodenum res_T2_v_T1_in_Low    <tibble [15,714 × 15]>
 5 Duodenum res_trial_interaction <tibble [15,714 × 15]>
 6 Ileum    res_Low_v_High_in_T1  <tibble [16,069 × 15]>
 7 Ileum    res_Low_v_High_in_T2  <tibble [16,069 × 15]>
 8 Ileum    res_T2_v_T1_in_High   <tibble [16,069 × 15]>
 9 Ileum    res_T2_v_T1_in_Low    <tibble [16,069 × 15]>
10 Ileum    res_trial_interaction <tibble [16,069 × 15]>
# ℹ 15 more rows
# ℹ Use `print(n = ...)` to see more rows

> names(deseq2_results$results)
 [1] "Duodenum" "Duodenum" "Duodenum" "Duodenum" "Duodenum" "Ileum"    "Ileum"    "Ileum"    "Ileum"    "Ileum"    "Jejunum" 
[12] "Jejunum"  "Jejunum"  "Jejunum"  "Jejunum"  "Liver"    "Liver"    "Liver"    "Liver"    "Liver"    "Muscle"   "Muscle"  
[23] "Muscle"   "Muscle"   "Muscle"  

Final data frame in wide format:

> test <- deseq2_results %>%
    tidyr::pivot_wider(
    names_from = contrast,
    values_from = results
    )

> test[,1:2]     # just showing one of the new columns
# A tibble: 5 × 2
  tissue   res_Low_v_High_in_T1  
  <chr>    <named list>          
1 Duodenum <tibble [15,714 × 15]>
2 Ileum    <tibble [16,069 × 15]>
3 Jejunum  <tibble [15,889 × 15]>
4 Liver    <tibble [14,325 × 15]>
5 Muscle   <tibble [13,159 × 15]>

> names(test$res_Low_v_High_in_T1)
[1] "" "" "" "" ""

First, I have to say this is an unusual data structure, in my experience it's not a very good idea to use named lists in a data.frame or tibble. I don't know what your dowstream scripts are, but I would be wary and if it was me, would want to rewrite some of it to take a more standard input (maybe providing the result and the tissue as 2 inputs).

Anyway...

Your hack 1 is almost there, but has a few of problems:

  • you are working column-wise (replacing the names of an entire column at once), no need for rowwise()
  • the function to apply has to be inside the across(), you are closing the parenthesis too early
  • in the .fns, you need to define .x. This is usually done with ~, but as written here you are not providing a function to .fns
  • the function <- returns the value assigned, so .fns = ~ {names(.x) <- tissue} is a function that returns the names (i.e. tissue), and you are replacing the values by their name. You need to return the correct value (the renamed list).

This should do what you want:

library(tidyverse)

# fake data
test <- tibble(tissue = c("Liver","Duodenum") |> rep(each = 3),
               contrast = c("res_Low_v_High_in_T1", "res_T2_v_T1_in_High", "res_trial_interaction") |> rep(times = 2),
               results = as.list(1:6) |> setNames(tissue))

test_wide <- test |>
  pivot_wider(names_from = contrast,
              values_from = results)

test_wide_renamed <- test_wide %>%
  mutate(across(!tissue, \(.x) {names(.x) <- tissue; .x}))

test_wide_renamed
#> # A tibble: 2 × 4
#>   tissue   res_Low_v_High_in_T1 res_T2_v_T1_in_High res_trial_interaction
#>   <chr>    <named list>         <named list>        <named list>         
#> 1 Liver    <int [1]>            <int [1]>           <int [1]>            
#> 2 Duodenum <int [1]>            <int [1]>           <int [1]>

test_wide_renamed$res_Low_v_High_in_T1
#> $Liver
#> [1] 1
#> 
#> $Duodenum
#> [1] 4

Created on 2024-08-23 with reprex v2.1.0

1 Like

Yes, I've not done this structure before, and it's sole purpose is an attempt at keeping all relevant data in one object. So... let's see how it goes.

Thank you for the detailed explanation of the solution.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.