Transform objects inside nested lists

bioinfguru · July 13, 2024, 5:45pm

Just to finish this off....

Now that we know it is the same transformation for everything.... what is the slightly different workflow that would be better?

dromano · July 13, 2024, 7:07pm

"Better" depends on what makes the data easier to work with, but since T1, T2, and int play similar roles, a longer form of the table might be appropriate, which at the same time allows for a single application of the transformation function:

# DIFFERENTIAL EXPRESSION ANALYSIS:
dds_tbl <- tibble(dds = dds_list)
dds_tbl <-
  dds_tbl |> 
  mutate(tissue = names(dds)) |> 
  rowwise() |>
  mutate(deseq = list(DESeq(dds))) |> 
  mutate(T1 = list(results(deseq, contrast = c("condition","Low","High")))) |>
  mutate(T2 = list(results(deseq, contrast = list(c("trial_2_vs_1","conditionLow.trial2"))))) |>
  mutate(int = list(results(deseq, name = "conditionLow.trial2"))) |> 
  ungroup() |>
  # collect values of T1, T2, and int into a single column
  pivot_longer(c(T1:int), names_to = "type", values_to = "result") |> 
  mutate(result = map(result, format_deseq2_results)) |>
  relocate(tissue)
dds_tbl

You would then just have to add type in addition to tissue in your filtering workflow.

peernisse · July 13, 2024, 7:21pm

One option could be to abstract the repeated transformation into its own function and use it on each list item with apply() or purrr::map().

myTransformation <- function(dds) {
    out <- list(
        res_trial_1 = results(dds, contrast = c("condition","Low","High")),
        res_trial_2 = results(dds, contrast = list(c("trial_2_vs_1","conditionLow.trial2"))),
        res_interaction = results(dds, name = "conditionLow.trial2")
    )
    
    return(out)
}

output <- purrr::map(seq_along(names(dds_list)), ~ myTransformation(dds_list[[.x]]))
names(output) <- names(dds_list)

bioinfguru · July 14, 2024, 12:28am

Sounds like the 3 columns T1, T2, and int become 1 column "results". And each [tissue,results] cell would then contain a list called "type" containing the 3 data frames. Ill take a look at it tomorrow, it sounds like a useful technique to learn, although in this case I would be saving just one line of code, but making the code less readable/debuggable.

bioinfguru · July 14, 2024, 12:34am

This sounds like what I was looking for originally. And, it would be useful to have a generic function snippet that I can drop into any project. I'll also have a look at this tomorrow but I like the idea of the list columns instead of a hierarchical list now (what I originally had).

Also, by adding this, then there would be 2 transformation functions written

my_Transformation()
format_deseq2_results()

Thanks

dromano · July 14, 2024, 12:48am

Yes, in your case, the trade-off may not be worthwhile.

bioinfguru · July 16, 2024, 3:45pm

Just for completion, this is the final code.

# Store list of 5 deseqdatasets in a tibble
diff_exp_analysis <- tibble(dds = dds_list) %>%   
  mutate(tissue = names(dds)) %>%
  relocate(tissue) %>%
  arrange(tissue)

# Differential expression analysis
diff_exp_analysis <-
  diff_exp_analysis |> 
  rowwise() |>
  mutate(deseq = list(DESeq(dds))) |> 
  mutate(T1 = list(results(deseq, contrast = c("condition","Low","High")))) |>
  mutate(T2 = list(results(deseq, contrast = list(c("trial_2_vs_1","conditionLow.trial2"))))) |>
  mutate(int = list(results(deseq, name = "conditionLow.trial2"))) |> 
  ungroup() |>
  tidyr::pivot_longer(c(T1:int), names_to = "contrast", values_to = "result") |> 
  mutate(result = map(result, format_deseq2_results))

# Extract results
diff_exp_analysis |>
  filter(tissue == "Duodenum" & contrast == "T1") |>
  pull() |>
  unlist() |>
  as.data.frame() |>
  as_tibble(rownames = "gene")

Thanks all for your help. Lots learned.

bioinfguru · July 16, 2024, 4:48pm

Actually there is one more thing, for completion unrelated to the original question, but appropriate nonetheless. This tibble with list columns will now be the source of downstream analysis within R. However, I can see that saving a tibble with list columns is not straightforward.

I have tried:

saveRDS(diff_exp_analysis, "path/to/file")
test <- readRDS("path/to/file")
identical(diff_exp_analysis, test)
> FALSE

The .RDS file saved is 200mb

Also looking at save/load/attach()

What do you recommend?

dromano · July 17, 2024, 8:26pm

This is surprising, and may deserve its own topic. Could you start a new one?

system · July 24, 2024, 8:26pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.