Hi all,
I have some data with multiple variables describing the mean and standard error for different treatments. These 'mean' and 'se' columns are in wide format and I want to pivot them to long format for plotting with error bars.
The variable names have the treatment and the measure (mean or standard error) separated by an underscore (e.g., heat_mean, heat_se).
Here's a reproducible example:
library(dplyr)
library(tidyr)
dat <- tibble(sample=c(rep("A",5),rep("B",5)),
location=c(rep("Hg",5),rep("Fg",5)),
heat_mean=rnorm(10),
heat_se=rnorm(10,0,.1),
cold_mean=rnorm(10),
cold_se=rnorm(10,0,.1))
The data would look like this:
sample | location | heat_mean | heat_se | cold_mean | cold_se |
---|---|---|---|---|---|
A | Hg | 1.0515611 | -0.0721989 | 0.3574717 | 0.1161773 |
A | Hg | 0.2075425 | 0.0599422 | 1.3722471 | 0.0369714 |
A | Hg | 0.5360133 | -0.1289325 | -0.4205705 | -0.1072817 |
A | Hg | -0.2970411 | -0.0501931 | 2.6781306 | -0.1029467 |
A | Hg | 0.3707710 | -0.0326119 | 1.0663721 | 0.0049086 |
B | Fg | -0.7290617 | -0.0745034 | 0.4845398 | 0.1408884 |
B | Fg | 1.8045341 | 0.0495731 | 0.6030854 | -0.0407545 |
B | Fg | -0.5688072 | 0.0031257 | -1.1049746 | 0.0936773 |
B | Fg | -0.8443316 | 0.0485147 | 0.3302154 | -0.0919323 |
B | Fg | 2.0630788 | -0.0720528 | 0.2073953 | 0.0641618 |
To almost get what I need, I can use the special .value
indicator in the names_to
argument to pivot_longer
and either split the column on the separator with names_sep
or use names_pattern
and a regular expression with capturing groups to match the strings before and after the underscore.
dat %>% pivot_longer(c(heat_mean:cold_se),
names_to = c(".value","treatment"), names_sep = "_")
The result here has the structure I need, but I would have to recode the values in treatment
and rename the new columns (Or perhaps edit the variable names first).
sample | location | treatment | heat | cold |
---|---|---|---|---|
A | Hg | mean | 1.0515611 | 0.3574717 |
A | Hg | se | -0.0721989 | 0.1161773 |
A | Hg | mean | 0.2075425 | 1.3722471 |
A | Hg | se | 0.0599422 | 0.0369714 |
A | Hg | mean | 0.5360133 | -0.4205705 |
A | Hg | se | -0.1289325 | -0.1072817 |
This SO question pretty much describes my situation, but in my case the variables are always "treatment" _ "measure".
I tried to define capture groups with the text after the separator first, but couldn't the matches to work. This is the RE I used: "((?<=_)\\w+$)(\\w+(?=_))"
. I must be misunderstanding capture groups and how they work from left to right.
Is there a way to define the groups directly with a regexp, or does anyone know how I could easily swap the strings around the separator so that "heat_mean" becomes "mean_heat" and then I can just use the separators?
Thanks!