Write a function to apply across tibble list columns that contain a single S4 object

bioinfguru · July 29, 2024, 3:03pm

Hi all,

EDIT: I have managed to solve this surprisingly. by making the original tibble longer so that all deseqresult objects are in a single column, then I can do what I need on all.

#################################################

ORIGINAL POST:
Its getting complex now. I have a tibble called "res", with 1 row and 5 named list columns. The content of each named list column is a single DESeqResults object (S4) containing ~15,000 rows (genes) and 6 columns. I want to extract 1 row (a gene of interest) from each of the 5 DESeqResults objects. I can do it, with alot of cut and paste, however I'm sure there is a prettier function that could be written instead, and that's where I'm getting stuck.

I will try to give as much relevant structural info as possible. It is difficult to produce synthetic data for you:

The gene + tissue of interest:

my_gene <- "ENSSSCG00000001398"
my_tissue <- "Duodenum"

The list column names:

> names(res) # list column names
[1] "res_High_v_Low_in_T1"  "res_High_v_Low_in_T2"  "res_T1_v_T2_in_High"   "res_T1_v_T2_in_Low"    
"res_trial_interaction"

The tibble:

> res
# A tibble: 1 × 5
  res_High_v_Low_in_T1 res_High_v_Low_in_T2 res_T1_v_T2_in_High res_T1_v_T2_in_Low 
res_trial_interaction
  <named list>         <named list>         <named list>        <named list>       <named list>         
1 <DESqRslt[,6]>       <DESqRslt[,6]>       <DESqRslt[,6]>      <DESqRslt[,6]>     <DESqRslt[,6]>

How I extract a DESeqResults object:

> head(deseq2_results$res_High_v_Low_in_T1$Duodenum, 3)
log2 fold change (MLE): condition_Low_vs_High effect 
Wald test p-value: condition_Low_vs_High effect 
DataFrame with 3 rows and 6 columns
                    baseMean log2FoldChange     lfcSE      stat    pvalue      padj
                   <numeric>      <numeric> <numeric> <numeric> <numeric> <numeric>
ENSSSCG00000028996 25296.563      0.0696311  0.114562  0.607803  0.543318  0.999646
ENSSSCG00000005267   934.190     -0.1282381  0.142367 -0.900755  0.367719  0.999646
ENSSSCG00000005269   285.571      0.3460855  0.251351  1.376901  0.168543  0.999646

This is where I am stuck: What I need is a function that applies across all columns without having to name them, but I can't figure it out without copy pasting and hard coding the column names. The following code gives me the transformation I need (not good, clearly):

res <- res %>% 
# transform column 1:
  mutate(res_High_v_Low_in_T1 = purrr::map(res_High_v_Low_in_T1,  
~deseq2_results$res_High_v_Low_in_T1$Duodenum[my_gene,])) %>% 
# transform column 2:
  mutate(res_High_v_Low_in_T2 = purrr::map(res_High_v_Low_in_T2, 
~deseq2_results$res_High_v_Low_in_T2$Duodenum[my_gene,])) %>%
# transform column 3:
  mutate(res_T1_v_T2_in_High = purrr::map(res_T1_v_T2_in_High, 
~deseq2_results$res_T1_v_T2_in_High$Duodenum[my_gene,])) %>%
# transform column 4:
  mutate(res_T1_v_T2_in_Low = purrr::map(res_T1_v_T2_in_Low, 
~deseq2_results$res_T1_v_T2_in_Low$Duodenum[my_gene,])) %>%
# transform column 5:
  mutate(res_trial_interaction = purrr::map(res_trial_interaction, 
~deseq2_results$res_trial_interaction$Duodenum[my_gene,]))

Once this is complete, I do the following:

# pivot
res <- res %>%
  tidyr::pivot_longer(
    cols = !c(),
    names_to = "contrast",
    values_to = "deseq_res"
  )

# create tibble of deseq2 results across all contrasts for my_gene
tissue_res <- tibble()
for (i in res$contrast) {
  # extract each contrast
  x <- res %>% 
    filter(contrast == i) %>% 
    pull(deseq_res) %>% 
    data.frame() %>% 
    rename_with(~ gsub("\\w*\\.", "", .x)) %>% 
    as_tibble() %>%
    mutate(contrast = i, tissue = my_tissue) %>%      # add tissue and contrast
    mutate(fc = 2^log2FoldChange) %>%                 # calculate fold change
    select(tissue, contrast, baseMean, fc, padj)      # select and arrange columns

  # bind rows
  tissue_res <- bind_rows(tissue_res, x)
}
tissue_res

With the final output as:

> tissue_res
# A tibble: 5 × 5
  tissue   contrast              baseMean    fc     padj
  <chr>    <chr>                    <dbl> <dbl>    <dbl>
1 Duodenum res_High_v_Low_in_T1      841. 1.59  1.44e- 4
2 Duodenum res_High_v_Low_in_T2      841. 0.970 9.95e- 1
3 Duodenum res_T1_v_T2_in_High       841. 0.836 1.74e- 1
4 Duodenum res_T1_v_T2_in_Low        841. 0.511 2.65e-11
5 Duodenum res_trial_interaction     841. 0.611 3.14e- 1

Thanks all in advance,

Kenneth

nirgrahamuk · July 29, 2024, 3:21pm

I would guess that you could leverage mutate's supporting function across()

bioinfguru · July 29, 2024, 3:23pm

Thank you I am looking at that at the moment. But I have a more simple way to structure this so I'm gonna delete this post and try again...and see if I can move forward

bioinfguru · July 29, 2024, 4:51pm

Solution:

# select gene
my_gene <- "ENSSSCG00000001398"

# pivot + extract
tib <- deseq2_results %>% select(!"dds") %>%
  tidyr::pivot_longer(
    cols = !c(tissue),
    names_to = "contrast",
    values_to = "deseqresults"
  ) %>% 
  mutate(baseMean = purrr::map_dbl(deseqresults, function(x) x %>% as_tibble(rownames = "gene") %>% 
filter(gene == my_gene) %>% pull(baseMean))) %>%
  mutate(log2FoldChange = purrr::map_dbl(deseqresults, function(x) x %>% as_tibble(rownames = "gene") 
%>% filter(gene == my_gene) %>% pull(log2FoldChange))) %>%
  mutate(padj = purrr::map_dbl(deseqresults, function(x) x %>% as_tibble(rownames = "gene") %>% 
filter(gene == my_gene) %>% pull(padj))) %>%
  mutate(fc = 2^log2FoldChange) %>%
  select(tissue, contrast, baseMean, fc, padj) %>%
  arrange(tissue)

> head(tib)
# A tibble: 6 × 5
  tissue   contrast              baseMean    fc     padj
  <chr>    <chr>                    <dbl> <dbl>    <dbl>
1 Duodenum res_High_v_Low_in_T1      841. 1.59  1.44e- 4
2 Duodenum res_High_v_Low_in_T2      841. 0.970 9.95e- 1
3 Duodenum res_T1_v_T2_in_High       841. 0.836 1.74e- 1
4 Duodenum res_T1_v_T2_in_Low        841. 0.511 2.65e-11
5 Duodenum res_trial_interaction     841. 0.611 3.14e- 1
6 Ileum    res_High_v_Low_in_T1     1028. 1.30  1.00e+ 0

system · August 5, 2024, 4:52pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.