Calculating Z-scores using Data Subsets by Row

statsbeginner · June 15, 2020, 8:55pm

Hi all,

Basically, I've been trying to create a simple script to calculate z-scores for a dataset using dplyr where

zscores <- mydata %>%

mutate_at(c(x:y), function, na.rm = TRUE)

My current function is:

function(x, na.rm = FALSE) (x - mean(x, na.rm = TRUE)) / sd(x, na.rm = TRUE)

which takes the mean and SD of the all values in calculating the z-score. I'm hoping to edit this script so that the mean and SD value is calculated using only values that come from group 1 for each respective column.

For example

grp | trial 1 | trial 2 | trial 3...

1 | 4 | 6 | 3

2 | 3 | 8 | 7

1 | 3 | 5 | 9

3 | 8 | 2 | 2

4 | 7 | 7 | 1

For each column of trials, the script would hopefully calculate a mean and SD from only the values from grp 1 (4 and 3 in trial 1's case). Then, it would use those values to create a z-score for every value in the column.

I've tried editing the function to something like:

(x, na.rm = FALSE) (x - mean(filter(group == "1"), na.rm = TRUE) / sd(filter(group == "1"), na.rm = TRUE))

or

(x, na.rm = FALSE) (x - mean(x[group==1]), na.rm = TRUE) / sd(x(group==1), na.rm = TRUE)) but it hasn't worked as intended. I feel like this should have an easy solution but I'm having a lot of issues figuring it out.

Thanks so much for your help beforehand!

StatSteph · June 15, 2020, 9:20pm

My big hint is to use the group_by function. This looks a bit like a homework problem so I don't think it's appropriate to provide a direct solution. FAQ: Homework Policy

system · July 6, 2020, 9:20pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.