I am trying to create a function that will search a dataframe xfile
for cases where ZSCORE
is more than 3 or less than -3, and item
matches the function argument. Then it searches another dataframe data
for PERSON LABEL
in xfile
and matches them to sidtp
in data
. When found it replaces the value of item
for that row only.
I can't get it to work exactly how I want and below is some modified code where item is hardcoded as HG13_2
. I have included examples of much smaller datasets than I am working with in reality to give an example.
library(tidyverse)
library(readxl)
library(haven)
#set zvalue thresholds for items
z_high <- 3
z_low <- -3
setwd("")
data <- read_excel("small_dataset.xlsx")
xfile <- read_excel("small_xfile.xlsx")
test_func3 <- function(xfile, item, data) {
xfile <- xfile %>%
filter((xfile$ZSCORE > z_high | xfile$ZSCORE < z_low) &
xfile$`ITEM LABEL` == item)
data <- mutate(data, HG13_2 = replace(HG13_2, sidtp
%in% xfile$`PERSON LABEL`, 999))
}
test3 <- test_func3(xfile, "HG13_2", data)
This is the contents of data
# A tibble: 11 × 3
sidtp HG13_2 HG14_2
<chr> <dbl> <dbl>
1 person_1 1 1
2 person_2 1 1
3 person_3 1 1
4 person_4 1 1
5 person_5 1 1
6 person_6 1 1
7 person_7 1 1
8 person_8 1 1
9 person_9 1 1
10 person_10 1 1
11 person_11 1 1
This is the contents of xfile
# A tibble: 11 × 5
PERSON ITEM ZSCORE `PERSON LABEL` `ITEM LABEL`
<dbl> <dbl> <dbl> <chr> <chr>
1 22 8 3.10 person_1 HG13_2
2 142 8 3.01 person_2 HG13_2
3 177 8 4.00 person_3 HG13_2
4 296 8 3.27 person_4 HG13_2
5 411 8 5.57 person_5 HG13_2
6 483 8 3.17 person_6 HG13_2
7 587 8 3.07 person_7 HG13_2
8 835 8 3.38 person_8 HG13_2
9 971 8 7.88 person_9 HG13_2
10 1048 8 3.24 person_10 HG13_2
11 9999 9 3.11 person_11 HG14_2
Running the code above gives me my desired output where matched cases have their values replaced for that cell only.
# A tibble: 11 × 3
sidtp HG13_2 HG14_2
<chr> <dbl> <dbl>
1 person_1 999 1
2 person_2 999 1
3 person_3 999 1
4 person_4 999 1
5 person_5 999 1
6 person_6 999 1
7 person_7 999 1
8 person_8 999 1
9 person_9 999 1
10 person_10 999 1
11 person_11 1 1
Here is what I want my code to look like
test_func3 <- function(xfile, item, data) {
xfile <- xfile %>%
filter((xfile$ZSCORE > z_high | xfile$ZSCORE < z_low) &
xfile$`ITEM LABEL` == item)
data <- mutate(data, item = replace(item, sidtp
%in% xfile$`PERSON LABEL`, 999))
}
test3 <- test_func3(xfile, "HG13_2", data)
The only difference is that HG13_2
inside mutate
and replace
has been changed to item
.
However, when I run this code it creates a new column with 999
for matched cells and NA
for everything else.
# A tibble: 11 × 4
# A tibble: 11 × 4
sidtp HG13_2 HG14_2 item
<chr> <dbl> <dbl> <chr>
1 person_1 1 1 999
2 person_2 1 1 999
3 person_3 1 1 999
4 person_4 1 1 999
5 person_5 1 1 999
6 person_6 1 1 999
7 person_7 1 1 999
8 person_8 1 1 999
9 person_9 1 1 999
10 person_10 1 1 999
11 person_11 1 1 NA
I think my issue has something to do with data masking, or scoping, but can't figure it out.