Just to finish this off....
Now that we know it is the same transformation for everything.... what is the slightly different workflow that would be better?
Just to finish this off....
Now that we know it is the same transformation for everything.... what is the slightly different workflow that would be better?
"Better" depends on what makes the data easier to work with, but since T1
, T2
, and int
play similar roles, a longer form of the table might be appropriate, which at the same time allows for a single application of the transformation function:
# DIFFERENTIAL EXPRESSION ANALYSIS:
dds_tbl <- tibble(dds = dds_list)
dds_tbl <-
dds_tbl |>
mutate(tissue = names(dds)) |>
rowwise() |>
mutate(deseq = list(DESeq(dds))) |>
mutate(T1 = list(results(deseq, contrast = c("condition","Low","High")))) |>
mutate(T2 = list(results(deseq, contrast = list(c("trial_2_vs_1","conditionLow.trial2"))))) |>
mutate(int = list(results(deseq, name = "conditionLow.trial2"))) |>
ungroup() |>
# collect values of T1, T2, and int into a single column
pivot_longer(c(T1:int), names_to = "type", values_to = "result") |>
mutate(result = map(result, format_deseq2_results)) |>
relocate(tissue)
dds_tbl
You would then just have to add type
in addition to tissue
in your filtering workflow.
One option could be to abstract the repeated transformation into its own function and use it on each list item with apply() or purrr::map().
myTransformation <- function(dds) {
out <- list(
res_trial_1 = results(dds, contrast = c("condition","Low","High")),
res_trial_2 = results(dds, contrast = list(c("trial_2_vs_1","conditionLow.trial2"))),
res_interaction = results(dds, name = "conditionLow.trial2")
)
return(out)
}
output <- purrr::map(seq_along(names(dds_list)), ~ myTransformation(dds_list[[.x]]))
names(output) <- names(dds_list)
Sounds like the 3 columns T1, T2, and int become 1 column "results". And each [tissue,results] cell would then contain a list called "type" containing the 3 data frames. Ill take a look at it tomorrow, it sounds like a useful technique to learn, although in this case I would be saving just one line of code, but making the code less readable/debuggable.
This sounds like what I was looking for originally. And, it would be useful to have a generic function snippet that I can drop into any project. I'll also have a look at this tomorrow but I like the idea of the list columns instead of a hierarchical list now (what I originally had).
Also, by adding this, then there would be 2 transformation functions written
Thanks
Yes, in your case, the trade-off may not be worthwhile.
Just for completion, this is the final code.
# Store list of 5 deseqdatasets in a tibble
diff_exp_analysis <- tibble(dds = dds_list) %>%
mutate(tissue = names(dds)) %>%
relocate(tissue) %>%
arrange(tissue)
# Differential expression analysis
diff_exp_analysis <-
diff_exp_analysis |>
rowwise() |>
mutate(deseq = list(DESeq(dds))) |>
mutate(T1 = list(results(deseq, contrast = c("condition","Low","High")))) |>
mutate(T2 = list(results(deseq, contrast = list(c("trial_2_vs_1","conditionLow.trial2"))))) |>
mutate(int = list(results(deseq, name = "conditionLow.trial2"))) |>
ungroup() |>
tidyr::pivot_longer(c(T1:int), names_to = "contrast", values_to = "result") |>
mutate(result = map(result, format_deseq2_results))
# Extract results
diff_exp_analysis |>
filter(tissue == "Duodenum" & contrast == "T1") |>
pull() |>
unlist() |>
as.data.frame() |>
as_tibble(rownames = "gene")
Thanks all for your help. Lots learned.
Actually there is one more thing, for completion unrelated to the original question, but appropriate nonetheless. This tibble with list columns will now be the source of downstream analysis within R. However, I can see that saving a tibble with list columns is not straightforward.
I have tried:
saveRDS(diff_exp_analysis, "path/to/file")
test <- readRDS("path/to/file")
identical(diff_exp_analysis, test)
> FALSE
The .RDS file saved is 200mb
Also looking at save/load/attach()
What do you recommend?
This is surprising, and may deserve its own topic. Could you start a new one?
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.