Greetings,
I am trying to use scale() to create z-scores for several variables at once.
Although I have figured out how to create the z-scores, I don't know how to save them to my data without overwriting the original variables. How can I add a prefix "z" to the new variables and save them to the original dataset?
Jason
#rstatsnewbie
# title: "reprex_z-scores"
# author: "Jason"
# date: "2019_04_30"
library(tidyverse)
library(reprex)
library(purrr)
library(here)
## IMPORT DATA FOR THE PILOT ANALYSES
dat <- read.csv(url('https://raw.githubusercontent.com/BrainStormCenter/ASQ_pilot/master/ASQ_pain_pilot_2019_04_19_v2.csv'), header = TRUE)
dat <- as_tibble(dat)
dat <- filter(dat, Groups != "other")
# CREATE A SMALLER DATA SET
dat.asq <- select(dat, ID, sex, asq_1a, asq_2a, asq_3a, asq_4a)
# CLACULATE Z-SCORES
scale(dat.asq[,c(grep("a$", colnames(dat.asq)))], center = TRUE, scale = TRUE)
#> asq_1a asq_2a asq_3a asq_4a
#> [1,] NA NA NA NA
#> [2,] NA NA NA NA
#> [3,] -1.02572353 -0.530116086 -0.4462601 0.11837965
#> [4,] -0.06098897 0.256197347 0.1792068 0.11837965
...
#> [136,] 0.90374560 0.256197347 0.1792068 0.11837965
#> [137,] 0.90374560 NA 0.1792068 1.28737874
#> attr(,"scaled:center")
#> asq_1a asq_2a asq_3a asq_4a
#> 2.063218 2.674179 2.713483 1.898734
#> attr(,"scaled:scale")
#> asq_1a asq_2a asq_3a asq_4a
#> 1.0365546 1.2717575 1.5988057 0.8554327
library(tidyverse)
dat <- read.csv(url('https://raw.githubusercontent.com/BrainStormCenter/ASQ_pilot/master/ASQ_pain_pilot_2019_04_19_v2.csv'), header = TRUE)
dat %>%
as_tibble() %>%
filter(Groups != "other") %>%
select(ID, sex, asq_1a, asq_2a, asq_3a, asq_4a) %>%
mutate_at(vars(starts_with("asq")), list(z = ~as.vector(scale(.))))
#> # A tibble: 137 x 10
#> ID sex asq_1a asq_2a asq_3a asq_4a asq_1a_z asq_2a_z asq_3a_z
#> <fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 IP003 1 NA NA NA NA NA NA NA
#> 2 IP005 1 NA NA NA NA NA NA NA
#> 3 IP008 2 1 2 2 2 -1.03 -0.530 -0.446
#> 4 IP009 2 2 3 3 2 -0.0610 0.256 0.179
#> 5 IP010 1 NA NA NA NA NA NA NA
#> 6 IP013 1 NA NA NA NA NA NA NA
#> 7 IP015 1 1 2 2 1 -1.03 -0.530 -0.446
#> 8 IP016 1 2 4 3 2 -0.0610 1.04 0.179
#> 9 IP017 1 3 3 3 1 0.904 0.256 0.179
#> 10 IP019 2 2 4 3 NA -0.0610 1.04 0.179
#> # … with 127 more rows, and 1 more variable: asq_4a_z <dbl>
I just learned that I was scoring the questionnaire incorrectly. Instead of calculating the z-scores by vector, they have to be calculated by individual (i.e., row-wise). I’ve tried this and was only successful in creating the new variables. However, all the values are NaN. Any help or guidance on what I’ve done wrong is greatly appreciated.
Cheers,
Jason
# title: "reprex_Zscores"
# author: "Jason"
# date: "2019_04_30"
#modified: "03/05/2019"
library(tidyverse)
library(reprex)
# library(purrr)
# library(here)
## Community answer
dat <- read.csv(url('https://raw.githubusercontent.com/BrainStormCenter/ASQ_pilot/master/ASQ_pain_pilot_2019_04_19_v2.csv'), header = TRUE)
# This creates z-scores for a variable/vector and works
# z.dat1 <- dat %>%
# as_tibble() %>%
# filter(Groups != "other") %>%
# select(ID, sex, asq_1a, asq_2a, asq_3a, asq_4a, asq_5a, asq_6a,
# asq_7a, asq_8a, asq_9a) %>%
# mutate_at(vars(starts_with("asq")), list(z1 = ~as.vector(scale(.))))
# Trying to calculate z-scores rowwise
# Colums are made but valvues are NaN
z.dat2 <- dat %>%
as_tibble() %>%
filter(Groups != "other") %>%
select(ID, sex, asq_1a, asq_2a, asq_3a, asq_4a, asq_5a, asq_6a,
asq_7a, asq_8a, asq_9a) %>%
rowwise() %>%
mutate_at(vars(starts_with("asq")), list(zz = ~as.vector(scale(.))))
Your code is triying to scale a single number for each column and that doesn't makes sense, that is why you get NaNs, one way to do this would be by reshaping your data
(Note: I'm using the new pivot_wider() function from the development version of dplyr)
library(tidyverse)
dat <- read.csv(url('https://raw.githubusercontent.com/BrainStormCenter/ASQ_pilot/master/ASQ_pain_pilot_2019_04_19_v2.csv'), header = TRUE)
z.dat2 <- dat %>%
as_tibble() %>%
filter(Groups != "other") %>%
select(ID, sex, asq_1a, asq_2a, asq_3a, asq_4a, asq_5a, asq_6a,
asq_7a, asq_8a, asq_9a) %>%
gather(asq, value, -ID, -sex) %>%
group_by(ID) %>%
mutate(z = as.vector(scale(value))) %>%
pivot_wider(names_from = asq, values_from = c(value, z))
z.dat2
#> # A tibble: 137 x 20
#> ID sex value_asq_1a value_asq_2a value_asq_3a value_asq_4a
#> <fct> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 IP003 1 NA NA NA NA
#> 2 IP005 1 NA NA NA NA
#> 3 IP008 2 1 2 2 2
#> 4 IP009 2 2 3 3 2
#> 5 IP010 1 NA NA NA NA
#> 6 IP013 1 NA NA NA NA
#> 7 IP015 1 1 2 2 1
#> 8 IP016 1 2 4 3 2
#> 9 IP017 1 3 3 3 1
#> 10 IP019 2 2 4 3 NA
#> # … with 127 more rows, and 14 more variables: value_asq_5a <dbl>,
#> # value_asq_6a <dbl>, value_asq_7a <dbl>, value_asq_8a <dbl>,
#> # value_asq_9a <dbl>, z_asq_1a <dbl>, z_asq_2a <dbl>, z_asq_3a <dbl>,
#> # z_asq_4a <dbl>, z_asq_5a <dbl>, z_asq_6a <dbl>, z_asq_7a <dbl>,
#> # z_asq_8a <dbl>, z_asq_9a <dbl>
Thank you. Because I use rowwise() to average these variables/subject, I, incorrectly, thought I could use scale() the same way. Thanks for the note about the pivot_wider() function. How do I get the development version of dplyr so I can try this solution?
Thank you. You have been very helpful. I have managed to actually include all the asq variables to scale. I have been trying to alter your code so that the results are added to z.dat2, but haven't had any success. I’ve tried different variations of mutate_at and select(everything()) but nothing has worked. Any suggestions on what I am missing or not understanding?
The problem is here, by excluding -everything() you are getting no effect with this command, what are you trying to accomplish by doing this?
I think that in this case it would be better to select the variables you want to gather instead of excluding the ones you don't. Try with something like this.
# This gathers all variables starting with "asq" and ending with "a"
gather(asq, value, matches("^asq.+a$"), na.rm = TRUE) %>%