Pair wise cor.test based on combinations of variables

jm_t · October 29, 2019, 2:32pm

Hi,

I would like to run pair-wise cor.test on a various columns of my dataset. Of course, I would like this to be dimension aware so that adding some new variable would not break my code.

Here is a reprex of what I have started to write. I can't access my dataset columns dynamically. I do not know if my syntax is wrong or if I am missing one step.

Also, if there is a package or function to does that, please let me know.

Regards,

jm

library(dplyr)
library(tidyr)
library(purrr)
library(tibble)

# Create our dummy data
my_data = as.data.frame(matrix(
  data = runif(400),
  ncol = 4
))
# define our names
my_names = letters[1:4]
colnames(my_data) = my_names
my_data = as_tibble(my_data)


# our pair wise combinations
my_comb = t(combn(my_names, 2))
colnames(my_comb) = c("ratio_1", "ratio_2")
my_comb = as_tibble(my_comb)

# Apply pair wise cor.test
my_comb %>% map2(
  .x = my_data[[.$ratio_1]],
  .y = my_data[[.$ratio_2]],
  .f = cor.test,
  alternative = "less", 
  conf.level = 0.99,
  method = "spearman"
)

mattwarkentin · October 29, 2019, 2:42pm

Hi @jm_t,

I would try the corrr package to see if it meets your needs.

my_data %>% 
  corrr::correlate()

This returns a tibble of correlations. Other helper functions in the package make it easier to play around with it to conform to your plotting/tabular needs:

# Stretch table to long-form, good for ggplot plotting
my_data %>% 
  corrr::correlate() %>% # can add diagonal=1 arg to specify default diagonal
  corrr::stretch()

# Remove upper trinagular
my_data %>% 
  corrr::correlate() %>% 
  corrr::shave()

Easy heatmap plotting:

my_data %>% 
  corrr::correlate(diagonal = 1) %>% 
  corrr::stretch() %>% 
  ggplot(aes(x, y, fill = r)) +
  geom_tile()

Yarnabrina · October 29, 2019, 2:45pm

You can also use corr.test from psych package. See my answer here:

jm_t · October 29, 2019, 3:04pm

Thank you to both of you. I knew that something should be ready to use but I failed to find it.

For my R knowledge, I will still curious to know what is the error in my reprex.

Regards,

jm

mattwarkentin · October 29, 2019, 3:30pm

Hi @jm_t,

Here is my attempt at re-working your code to get it to work as I think you want it to:

my_comb %>% 
  mutate(cors = map2(
    ratio_1, 
    ratio_2, 
    ~cor.test(my_data[[.x]], my_data[[.y]],
    alternative = "less", 
    conf.level = 0.99,
    method = "spearman"))
    )

This will add a new column to the my_comb data which is a list-column of the cor.test results. To inspect the first element in the list:

my_comb %>% 
  mutate(cors = map2(
    ratio_1, 
    ratio_2, 
    ~cor.test(my_data[[.x]], my_data[[.y]],
              alternative = "less", 
              conf.level = 0.99,
              method = "spearman"))
  ) %>% 
  pluck(., 'cors', 1)

jm_t · October 29, 2019, 8:26pm

I try to adapt the code so that I can do test of all subset of the data set. I am failing at piping and group by. I grasp the tidyverse but not yet up to speed.

library(dplyr)
library(tidyr)
library(purrr)
library(tibble)
library(corrr)

# Create our dummy data
my_data = as.data.frame(matrix(
  data = runif(400),
  ncol = 4
))
# define our names
my_names = letters[1:4]
colnames(my_data) = my_names
my_data = as_tibble(my_data)
my_cat = c("cat1", "cat2", "cat3", "cat4")
my_data$group = my_cat

# our pair wise combinations
my_comb = t(combn(my_names, 2))
colnames(my_comb) = c("ratio_1", "ratio_2")
my_comb = as_tibble(my_comb)
my_comb %>% 
  merge(y = my_cat  ) %>%
  rename(my_group=y) %>%

#pseudo code that fails
  mutate(
    cors = pmap(
      my_data,
      ~cor.test(
          my_data[[.$my_group, .$ratio_1]], 
          my_data[[.$my_group, .$ratio_2]],
          alternative = "less", 
          conf.level = 0.99,
          method = "spearman"
      )
  )
)

mattwarkentin · October 29, 2019, 8:37pm

Hi @jm_t,

Could you clarify for me what exactly you are trying to do? Are you trying to get pairwise correlations for every variable for different subsets of the data?

mattwarkentin · October 29, 2019, 8:52pm

If my above guess is correct, I would suggest using the following approach:

my_data %>% 
  group_nest(group) %>% 
  mutate(cors = map(data, corrr::correlate),
         stretch = map(cors, corrr::stretch)) %>% 
  unnest(stretch)

Here is a breakdown of what this code is doing:

First off, the %>% is just a way of output of one function as the input to the next function (or, in the case of the first step, simply passing the data forward)
Next, group_nest() will group the data by your group variable, and nest them as data frames inside your data frame. This is a more advanced concept, but a very powerful one. Try just running my_data %>% group_nest(group) to see exactly what is happening here. The 25 observations that belong to each group are now stored in their own separate tibbles (a type of data frame), which makes it easier to apply functions to each data set
mutate() creates new variables, so we add the column cors which is the pairwise correlations for every pair of variables, and it loops over the data for each of the 4 groups. Then we create the variable stretch, which turns the correlation table from wide to long.
Lastly, we can use unnest(), which does the opposite of nesting functions. It will take the column of data frames and unpack them back into our main data frame

# A tibble: 64 x 6
   group data              cors             x     y           r
   <chr> <list>            <list>           <chr> <chr>   <dbl>
 1 cat1  <tibble [25 x 4]> <tibble [4 x 5]> a     a     NA     
 2 cat1  <tibble [25 x 4]> <tibble [4 x 5]> a     b      0.170 
 3 cat1  <tibble [25 x 4]> <tibble [4 x 5]> a     c      0.411 
 4 cat1  <tibble [25 x 4]> <tibble [4 x 5]> a     d      0.204 
 5 cat1  <tibble [25 x 4]> <tibble [4 x 5]> b     a      0.170 
 6 cat1  <tibble [25 x 4]> <tibble [4 x 5]> b     b     NA     
 7 cat1  <tibble [25 x 4]> <tibble [4 x 5]> b     c      0.0285
 8 cat1  <tibble [25 x 4]> <tibble [4 x 5]> b     d      0.133 
 9 cat1  <tibble [25 x 4]> <tibble [4 x 5]> c     a      0.411 
10 cat1  <tibble [25 x 4]> <tibble [4 x 5]> c     b      0.0285
# ... with 54 more rows

The results shown above present the pairwise (x-y) correlations (r) for every group. We can also relatively easily spin this into a plot, one for each group.

library(ggplot2)
my_data %>% 
  group_nest(group) %>% 
  mutate(cors = map(data, corrr::correlate),
         stretch = map(cors, corrr::stretch)) %>% 
  unnest(stretch) %>% 
  ggplot(aes(x, y, fill = r)) +
  geom_tile() +
  facet_wrap(~group)

jm_t · October 30, 2019, 8:03am

Hi,

Yes I would like to do paired wise spearman correlation for each variable permutations. Also I would like to do the same for some subset of my data. I would like to test if correlation are higher when considering only observation for one class.

From looking at you code, I was missing the group_nest "trick". Did not knew it exists. Thank you a lot for your time writing the explanation. It does help a loot getting into the tidyverse mindset.

Regards,

jm

system · November 6, 2019, 8:03am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.