I would like to run pair-wise cor.test on a various columns of my dataset. Of course, I would like this to be dimension aware so that adding some new variable would not break my code.
Here is a reprex of what I have started to write. I can't access my dataset columns dynamically. I do not know if my syntax is wrong or if I am missing one step.
Also, if there is a package or function to does that, please let me know.
I would try the corrr package to see if it meets your needs.
my_data %>%
corrr::correlate()
This returns a tibble of correlations. Other helper functions in the package make it easier to play around with it to conform to your plotting/tabular needs:
# Stretch table to long-form, good for ggplot plotting
my_data %>%
corrr::correlate() %>% # can add diagonal=1 arg to specify default diagonal
corrr::stretch()
I try to adapt the code so that I can do test of all subset of the data set. I am failing at piping and group by. I grasp the tidyverse but not yet up to speed.
Could you clarify for me what exactly you are trying to do? Are you trying to get pairwise correlations for every variable for different subsets of the data?
First off, the %>% is just a way of output of one function as the input to the next function (or, in the case of the first step, simply passing the data forward)
Next, group_nest() will group the data by your group variable, and nest them as data frames inside your data frame. This is a more advanced concept, but a very powerful one. Try just running my_data %>% group_nest(group) to see exactly what is happening here. The 25 observations that belong to each group are now stored in their own separate tibbles (a type of data frame), which makes it easier to apply functions to each data set
mutate() creates new variables, so we add the column cors which is the pairwise correlations for every pair of variables, and it loops over the data for each of the 4 groups. Then we create the variable stretch, which turns the correlation table from wide to long.
Lastly, we can use unnest(), which does the opposite of nesting functions. It will take the column of data frames and unpack them back into our main data frame
# A tibble: 64 x 6
group data cors x y r
<chr> <list> <list> <chr> <chr> <dbl>
1 cat1 <tibble [25 x 4]> <tibble [4 x 5]> a a NA
2 cat1 <tibble [25 x 4]> <tibble [4 x 5]> a b 0.170
3 cat1 <tibble [25 x 4]> <tibble [4 x 5]> a c 0.411
4 cat1 <tibble [25 x 4]> <tibble [4 x 5]> a d 0.204
5 cat1 <tibble [25 x 4]> <tibble [4 x 5]> b a 0.170
6 cat1 <tibble [25 x 4]> <tibble [4 x 5]> b b NA
7 cat1 <tibble [25 x 4]> <tibble [4 x 5]> b c 0.0285
8 cat1 <tibble [25 x 4]> <tibble [4 x 5]> b d 0.133
9 cat1 <tibble [25 x 4]> <tibble [4 x 5]> c a 0.411
10 cat1 <tibble [25 x 4]> <tibble [4 x 5]> c b 0.0285
# ... with 54 more rows
The results shown above present the pairwise (x-y) correlations (r) for every group. We can also relatively easily spin this into a plot, one for each group.
Yes I would like to do paired wise spearman correlation for each variable permutations. Also I would like to do the same for some subset of my data. I would like to test if correlation are higher when considering only observation for one class.
From looking at you code, I was missing the group_nest "trick". Did not knew it exists. Thank you a lot for your time writing the explanation. It does help a loot getting into the tidyverse mindset.