I am trying to create a function that will search a dataframe xfile for cases where ZSCORE is more than 3 or less than -3, and item matches the function argument. Then it searches another dataframe data for PERSON LABEL in xfile and matches them to sidtp in data. When found it replaces the value of item for that row only.
I can't get it to work exactly how I want and below is some modified code where item is hardcoded as HG13_2. I have included examples of much smaller datasets than I am working with in reality to give an example.
library(tidyverse)
library(readxl)
library(haven)
#set zvalue thresholds for items
z_high <- 3
z_low <- -3
setwd("")
data <- read_excel("small_dataset.xlsx")
xfile <- read_excel("small_xfile.xlsx")
test_func3 <- function(xfile, item, data) {
xfile <- xfile %>%
filter((xfile$ZSCORE > z_high | xfile$ZSCORE < z_low) &
xfile$`ITEM LABEL` == item)
data <- mutate(data, HG13_2 = replace(HG13_2, sidtp
%in% xfile$`PERSON LABEL`, 999))
}
test3 <- test_func3(xfile, "HG13_2", data)
This is the contents of data
# A tibble: 11 × 3
sidtp HG13_2 HG14_2
<chr> <dbl> <dbl>
1 person_1 1 1
2 person_2 1 1
3 person_3 1 1
4 person_4 1 1
5 person_5 1 1
6 person_6 1 1
7 person_7 1 1
8 person_8 1 1
9 person_9 1 1
10 person_10 1 1
11 person_11 1 1
This is the contents of xfile
# A tibble: 11 × 5
PERSON ITEM ZSCORE `PERSON LABEL` `ITEM LABEL`
<dbl> <dbl> <dbl> <chr> <chr>
1 22 8 3.10 person_1 HG13_2
2 142 8 3.01 person_2 HG13_2
3 177 8 4.00 person_3 HG13_2
4 296 8 3.27 person_4 HG13_2
5 411 8 5.57 person_5 HG13_2
6 483 8 3.17 person_6 HG13_2
7 587 8 3.07 person_7 HG13_2
8 835 8 3.38 person_8 HG13_2
9 971 8 7.88 person_9 HG13_2
10 1048 8 3.24 person_10 HG13_2
11 9999 9 3.11 person_11 HG14_2
Running the code above gives me my desired output where matched cases have their values replaced for that cell only.
# A tibble: 11 × 3
sidtp HG13_2 HG14_2
<chr> <dbl> <dbl>
1 person_1 999 1
2 person_2 999 1
3 person_3 999 1
4 person_4 999 1
5 person_5 999 1
6 person_6 999 1
7 person_7 999 1
8 person_8 999 1
9 person_9 999 1
10 person_10 999 1
11 person_11 1 1
Here is what I want my code to look like
test_func3 <- function(xfile, item, data) {
xfile <- xfile %>%
filter((xfile$ZSCORE > z_high | xfile$ZSCORE < z_low) &
xfile$`ITEM LABEL` == item)
data <- mutate(data, item = replace(item, sidtp
%in% xfile$`PERSON LABEL`, 999))
}
test3 <- test_func3(xfile, "HG13_2", data)
The only difference is that HG13_2 inside mutate and replace has been changed to item.
However, when I run this code it creates a new column with 999 for matched cells and NA for everything else.
# A tibble: 11 × 4
# A tibble: 11 × 4
sidtp HG13_2 HG14_2 item
<chr> <dbl> <dbl> <chr>
1 person_1 1 1 999
2 person_2 1 1 999
3 person_3 1 1 999
4 person_4 1 1 999
5 person_5 1 1 999
6 person_6 1 1 999
7 person_7 1 1 999
8 person_8 1 1 999
9 person_9 1 1 999
10 person_10 1 1 999
11 person_11 1 1 NA
I think my issue has something to do with data masking, or scoping, but can't figure it out.