Hi all,
EDIT: I have managed to solve this surprisingly. by making the original tibble longer so that all deseqresult objects are in a single column, then I can do what I need on all.
#################################################
ORIGINAL POST:
Its getting complex now. I have a tibble called "res", with 1 row and 5 named list columns. The content of each named list column is a single DESeqResults object (S4) containing ~15,000 rows (genes) and 6 columns. I want to extract 1 row (a gene of interest) from each of the 5 DESeqResults objects. I can do it, with alot of cut and paste, however I'm sure there is a prettier function that could be written instead, and that's where I'm getting stuck.
I will try to give as much relevant structural info as possible. It is difficult to produce synthetic data for you:
The gene + tissue of interest:
my_gene <- "ENSSSCG00000001398"
my_tissue <- "Duodenum"
The list column names:
> names(res) # list column names
[1] "res_High_v_Low_in_T1" "res_High_v_Low_in_T2" "res_T1_v_T2_in_High" "res_T1_v_T2_in_Low"
"res_trial_interaction"
The tibble:
> res
# A tibble: 1 × 5
res_High_v_Low_in_T1 res_High_v_Low_in_T2 res_T1_v_T2_in_High res_T1_v_T2_in_Low
res_trial_interaction
<named list> <named list> <named list> <named list> <named list>
1 <DESqRslt[,6]> <DESqRslt[,6]> <DESqRslt[,6]> <DESqRslt[,6]> <DESqRslt[,6]>
How I extract a DESeqResults object:
> head(deseq2_results$res_High_v_Low_in_T1$Duodenum, 3)
log2 fold change (MLE): condition_Low_vs_High effect
Wald test p-value: condition_Low_vs_High effect
DataFrame with 3 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
ENSSSCG00000028996 25296.563 0.0696311 0.114562 0.607803 0.543318 0.999646
ENSSSCG00000005267 934.190 -0.1282381 0.142367 -0.900755 0.367719 0.999646
ENSSSCG00000005269 285.571 0.3460855 0.251351 1.376901 0.168543 0.999646
This is where I am stuck: What I need is a function that applies across all columns without having to name them, but I can't figure it out without copy pasting and hard coding the column names. The following code gives me the transformation I need (not good, clearly):
res <- res %>%
# transform column 1:
mutate(res_High_v_Low_in_T1 = purrr::map(res_High_v_Low_in_T1,
~deseq2_results$res_High_v_Low_in_T1$Duodenum[my_gene,])) %>%
# transform column 2:
mutate(res_High_v_Low_in_T2 = purrr::map(res_High_v_Low_in_T2,
~deseq2_results$res_High_v_Low_in_T2$Duodenum[my_gene,])) %>%
# transform column 3:
mutate(res_T1_v_T2_in_High = purrr::map(res_T1_v_T2_in_High,
~deseq2_results$res_T1_v_T2_in_High$Duodenum[my_gene,])) %>%
# transform column 4:
mutate(res_T1_v_T2_in_Low = purrr::map(res_T1_v_T2_in_Low,
~deseq2_results$res_T1_v_T2_in_Low$Duodenum[my_gene,])) %>%
# transform column 5:
mutate(res_trial_interaction = purrr::map(res_trial_interaction,
~deseq2_results$res_trial_interaction$Duodenum[my_gene,]))
Once this is complete, I do the following:
# pivot
res <- res %>%
tidyr::pivot_longer(
cols = !c(),
names_to = "contrast",
values_to = "deseq_res"
)
# create tibble of deseq2 results across all contrasts for my_gene
tissue_res <- tibble()
for (i in res$contrast) {
# extract each contrast
x <- res %>%
filter(contrast == i) %>%
pull(deseq_res) %>%
data.frame() %>%
rename_with(~ gsub("\\w*\\.", "", .x)) %>%
as_tibble() %>%
mutate(contrast = i, tissue = my_tissue) %>% # add tissue and contrast
mutate(fc = 2^log2FoldChange) %>% # calculate fold change
select(tissue, contrast, baseMean, fc, padj) # select and arrange columns
# bind rows
tissue_res <- bind_rows(tissue_res, x)
}
tissue_res
With the final output as:
> tissue_res
# A tibble: 5 × 5
tissue contrast baseMean fc padj
<chr> <chr> <dbl> <dbl> <dbl>
1 Duodenum res_High_v_Low_in_T1 841. 1.59 1.44e- 4
2 Duodenum res_High_v_Low_in_T2 841. 0.970 9.95e- 1
3 Duodenum res_T1_v_T2_in_High 841. 0.836 1.74e- 1
4 Duodenum res_T1_v_T2_in_Low 841. 0.511 2.65e-11
5 Duodenum res_trial_interaction 841. 0.611 3.14e- 1
Thanks all in advance,
Kenneth