Transform objects inside nested lists

Hi all,

This is a difficult one for me to create a sample data set so I will try to describe as clearly as possible.

Basically, I want to access the data inside nested lists, not just the names of the data.

I have a list object called res_list. It contains 5 list objects (1 for each of 5 tissues). Each of these list objects contains 3 DESeqResults class objects. So a total of 15 deseqresults objects stored.

res_list[[1]][1]  # extracts the first deseqresults object in the first tissue
res_list[[1]][2]  # extracts the second deseqresults object in the first tissue
res_list[[1]][3]  # extracts the third deseqresults object in the first tissue
res_list[[2]][1]  # extracts the first deseqresults object in the second tissue
res_list[[2]][2]  # and so on...
...
res_list[[5]][3]  # extracts the final deseqresults object in the final tissue

I need to transform each of the 15 deseq2results objects (adding/rearranging columns, filtering/ordering rows etc) But, I am struggling to access them. I have thought of a for loop in a for loop, an lapply in an lapply, and so on.

There is quite alot of transformation to do so I don't want to spend days coding myself into a dead end. I will be storing the resulting transformed data frames in the same way in a new list object

My best attempt so far has been:

for (tissue in names(res_list)) {
  x <- names(test_list[[tissue]][1:3])
  #x <- names(test_list[[tissue]][[1:3]]) # Error in test_list[[tissue]][[1:3]] :  recursive indexing failed at level 2
  lapply(x,function(x){ 

    # res_T1
    if (x == "res_T1"){     
      print(x)
      }
    # res_T2 

    # res_int
    
    }) 
}

returns:

[1] "res_T1"
[1] "res_T1"
[1] "res_T1"
[1] "res_T1"
[1] "res_T1"

So what I can see is that I am just accessing the names of each object, but not the the object itself.
So print(x) returns prints "res_T1" 5 times (1 for each tissue), instead of printing the 5 deseq2results objects called "res_T1". If I can't even print it, I certainly cant transform it.

If I try the following instead of print(x), I just get a concatenated stream:

print(paste(res_list,tissue, x, sep="$") # i.e. res_list$Duodenum$res_T1

Another idea is to use a incremental counter to increase res_list[[ x ]][ y ] and process one at a time but Im not even sure if Ill hit the same prob.

I hate nested lists, but it seemed like the most efficient way of storing all this data in 1 place, and it's what I have now sooooo....

Thanks in advance.

Kenneth

This does not extract the first object from the first list, but rather extracts a one-item list containing the first object. To gain access to the object, you need to use [[ instead of [:

nested_list <- 
  list(
    a = list(aa = 1, ab = 2, ac = 3),
    b = list(ba = 4, bb = 5, bc = 6),
    c = list(ca = 7, cb = 8, cc = 9)
  )

nested_list[["a"]][1]
#> $aa
#> [1] 1

nested_list[["a"]][[1]]
#> [1] 1

Created on 2024-07-11 with reprex v2.0.2

And to confirm: You'd like a solution that only uses base R, is that right?

2 Likes

Hi

Thank you, I didn't realise about the brackets. Any efficient solution will do as long as I can transform each deseq object first to a dataframe/tibble/matrix, then perform further transformations, then store them in a new list object. I'm familiar with Tidyverse.

Also, a quick adding of the brackets produces a'recursive indexing failed ' error (added to snippet above)

Maybe a list object was not the right data structure here.

Regards,
Kenneth

Tibbles allow list columns, which make it possible to have tables that contain any type of object in a single table "cell". That in turn would allow access to all the tools available through the tidyverse. Is there a reason you would like the final object to be a list?

The [[ selection function only allows extraction of a single list element, so can only take an index or name as input.

No, no reason other than wanting the data all in one container with easy coding access... which isn't happening so far. At this point I'd nearly put it in a database instead.

Thanks; in that case, in what form do gain access to the DESeqResults objects? And do you gain access to them individually, tissue by tissue?

First, I have a list of 5 deseqdatasets (1 per tissue):

names(dds_list)
[1] "Duodenum" "Muscle"   "Jejunum"  "Liver"    "Ileum" 
class(dds_list$Duodenum)
[1] "DESeqDataSet"
attr(,"package")
[1] "DESeq2"

Then I am creating 3 sets of results per tissue and storing them as nested lists in res_list

res_list <- list()
for (name in names(dds_list)) {
  dds <- DESeq(dds_list[[name]])
  res_trial_1 <- results(dds, contrast = c("condition","Low","High"))
  res_trial_2 <- results(dds, contrast = list(c("trial_2_vs_1","conditionLow.trial2")))
  res_interaction <- results(dds, name = "conditionLow.trial2")
  res_list[[name]] <- list(res_T1 = res_trial_1, res_T2 = res_trial_2, res_int = res_interaction)
}

The easiest way to do this I think is to name each result as a concatenated string (i.e. tissue_result, e.g. duodenum_T1) then I can store them all in the top level list (no nesting) and avoid the problem altogether. I've created a function that does all the transformations I need. So the next step would just be a repeat of what I've done already.

res_formatted_list <- list()
for (name in names(res_list)) {

  # 1: run my transformation function on each element
  # 2: save the results in res_formatted_list[[name]]
}

Or... Mayby lapply can be used easily enough now if I have a single level list, and a function created.

Without having direct access to your data, here's a stab at a workflow that produces a table corresponding to dds_list that uses list columns:

dds_tbl <- tibble(master = dds_list)
dds_tbl

dds_tbl <-
  dds_tbl |> 
  mutate(tissue = names(master)) |> 
  mutate(dataset = map(tissue, \(name) master[[name]])) |> 
  rowwise() |> 
  mutate(dds = DESeq(dataset)) |> 
  mutate(T1 = results(dds, contrast = c("condition","Low","High"))) |>
  mutate(T2 = results(dds, contrast = list(c("trial_2_vs_1","conditionLow.trial2")))) |>
  mutate(int = results(dds, name = "conditionLow.trial2")) |> 
  ungroup()
dds_tbl

and assuming each of T1, T2, and int results might require their own transformation function, tr_fn_result:

dds_tbl <-
  dds_tbl |> 
  mutate(altT1 = map(T1, tr_fn_T1))

etc. (If one transformation function works for all three, then a slightly different workflow would be better.)

The use of list columns requires a little more care than base R column types, but I think are worth the effort, and often rely on the map*() family of functions, which are tidyverse relatives of lapply().

1 Like

Thank your for that. There will be a few things to debug, and a lot to learn in this snippet. For example the following error occurs at mutate(dataset = map....

Error in `mutate()`:
ℹ In argument: `dds = DESeq(dataset)`.
ℹ In row 1.
Caused by error:
! `dds` must be a vector, not a <DESeqDataSet> object.
ℹ Did you mean: `dds = list(DESeq(dataset))` ?

Again, I was struggling to look inside the tibble until I realised it is the same access syntax i.e. dds_tbl[[1]][[1]], dds_tbl$master$Duodenum. Clearly there are gaps in my basics. Do you have any links to resources that can tie this all together? i.e. tibbles + list columns + map()+applying functions.

I can't say it has this exact pattern, but in general, R for Data Science is a great reference material for working with complex data.

1 Like

I had a feeling there might be errors like this, but if you do as the error suggests and wrap list() around the call to DESSeq(), that should fix it, and similarly when another such error pops up — it's easy to get them when working with list columns.

1 Like

Wow, this is crazy how powerful this map and tibble list columns are. I have it working with a few tweeks.

So far this works.

dds_tbl <- tibble(master = dds_list)    
dds_tbl <-
  dds_tbl |> 
  mutate(tissue = names(master)) |> 
  mutate(dataset = map(tissue, \(name) master[[name]])) |> 
  rowwise() |>
  mutate(dds = list(DESeq(dataset))) |> 
  mutate(T1 = list(results(dds, contrast = c("condition","Low","High")))) |>
  mutate(T2 = list(results(dds, contrast = list(c("trial_2_vs_1","conditionLow.trial2"))))) |>
  mutate(int = list(results(dds, name = "conditionLow.trial2"))) |> 
  ungroup()
dds_tbl <-
  dds_tbl |> 
  mutate(altT1 = map(T1, format_deseq2_results))

When I extract a list column, I can't tell which element belongs to which tissue (apart from the order). I'd like to be able to extract a specific tissue from a specific list column

dds_tbl$altT1 # returns all of the altT1 list column, displays indexes [1] through [5], but no tissue names

Do I apply filter/select to specifically extract say Duodenum, altT1?

Thanks

You can add tissue names to the list columns, like this:

library(tidyverse)

tibble(x = letters[1:3], y = list(1:3)) |> 
  mutate(z = set_names(y, x)) -> temp

temp
#> # A tibble: 3 × 3
#>   x     y         z           
#>   <chr> <list>    <named list>
#> 1 a     <int [3]> <int [3]>   
#> 2 b     <int [3]> <int [3]>   
#> 3 c     <int [3]> <int [3]>

temp$z$a
#> [1] 1 2 3

Created on 2024-07-12 with reprex v2.0.2

I would have use data and names more similar to your context, but can't at the moment — hopefully this is helpful.

Hi,

Actually, no need to name anything else because I can select/filter easily.

dds_tbl |>
  filter(tissue == "Duodenum") |>
  select(altT1) |> 
  unlist() |>
  as.data.frame(check.names = FALSE) |>
  head()

Returns:

I would like the format of the output as a dataframe without the column names prefixes. How do I stop "altT1."prefix being added to every column name ? I know how to remove it after it is added, but can I prevent the prefix being added in the first place?

Regards,
Kenneth

PS: I'm noticing this is all very memory usage heavy.

Yes, but it may depend on the transformation you applied — is there one function that applies to T1, T2, and int, or one for each? And I assume you modified the code you shared earlier:

If so, what is the currect version?

Current Code works great. Have tested, it is doing exactly what I want. Same transformation for all.

So it is only a minor change here, but it is just a handy one to know: how to stop prefixes being added to data frame columns?

Otherwise, I'm done.

# DIFFERENTIAL EXPRESSION ANALYSIS:
dds_tbl <- tibble(dds = dds_list)
dds_tbl <-
  dds_tbl |> 
  mutate(tissue = names(dds)) |> 
  rowwise() |>
  mutate(deseq = list(DESeq(dds))) |> 
  mutate(T1 = list(results(deseq, contrast = c("condition","Low","High")))) |>
  mutate(T2 = list(results(deseq, contrast = list(c("trial_2_vs_1","conditionLow.trial2"))))) |>
  mutate(int = list(results(deseq, name = "conditionLow.trial2"))) |> 
  ungroup() |>
  mutate(T1 = map(T1, format_deseq2_results)) |>
  mutate(T2 = map(T2, format_deseq2_results)) |>
  mutate(int = map(int, format_deseq2_results)) |>
  relocate(tissue)
dds_tbl

Output as desired:

# Extract results
dds_tbl |>
  filter(tissue == "Duodenum") |>
  select(T1) |> 
  unlist() |>
  as.data.frame() |> # is there an option to include here to prevent prefix creation?
  rename_with(~ gsub("\\w*\\.", "", .x)) |>     # this line resolves the prefix after it is created
  head()

Output as desired:

This is the culprit, I think: When you select a column from a table —which is a list — you're applying the [ function instead of the [[ function; the analogue is to use pull(T1) instead. Does that help?

Perfect!

# Extract results
dds_tbl |>
  filter(tissue == "Duodenum") |>
  pull(T1) |> 
  unlist() |>
  as.data.frame() |>
  head()

Thank you very much, it wouldve taken me a long long time to figure all that out.

Just in reflection, I think I can see the benefit of the list columns within tibbles now.

  • Nested lists: Involves creating deeper and deeper nested lists (like the tree structure I had at the beginning), which become more and more confusing to access.

  • List columns: Allows creation of grouping variables for filtering (like tissue), and then creation of a new column for each of what would have been an element in an nested list. Access is then so much more simple and intuitive.

That was next level. Thank you

1 Like

Yes I have it, I use it a lot, but the specific application here, was just a bit to much.

Yes, it seems that's the trade off — for now, at least!