Passing argument from function into mutate

I am trying to create a function that will search a dataframe xfile for cases where ZSCORE is more than 3 or less than -3, and item matches the function argument. Then it searches another dataframe data for PERSON LABEL in xfile and matches them to sidtp in data. When found it replaces the value of item for that row only.

I can't get it to work exactly how I want and below is some modified code where item is hardcoded as HG13_2. I have included examples of much smaller datasets than I am working with in reality to give an example.

library(tidyverse)
library(readxl)
library(haven)

#set zvalue thresholds for items
z_high <- 3
z_low <- -3

setwd("")

data <- read_excel("small_dataset.xlsx")
xfile <- read_excel("small_xfile.xlsx") 

test_func3 <- function(xfile, item, data) {
  xfile <- xfile %>%
    filter((xfile$ZSCORE > z_high | xfile$ZSCORE < z_low) & 
             xfile$`ITEM LABEL` == item)
  
  data <- mutate(data, HG13_2 = replace(HG13_2, sidtp 
                                        %in% xfile$`PERSON LABEL`, 999))
}

test3 <- test_func3(xfile, "HG13_2", data)

This is the contents of data

# A tibble: 11 × 3
   sidtp     HG13_2 HG14_2
   <chr>      <dbl>  <dbl>
 1 person_1       1      1
 2 person_2       1      1
 3 person_3       1      1
 4 person_4       1      1
 5 person_5       1      1
 6 person_6       1      1
 7 person_7       1      1
 8 person_8       1      1
 9 person_9       1      1
10 person_10      1      1
11 person_11      1      1

This is the contents of xfile

# A tibble: 11 × 5
   PERSON  ITEM ZSCORE `PERSON LABEL` `ITEM LABEL`
    <dbl> <dbl>  <dbl> <chr>          <chr>       
 1     22     8   3.10 person_1       HG13_2      
 2    142     8   3.01 person_2       HG13_2      
 3    177     8   4.00 person_3       HG13_2      
 4    296     8   3.27 person_4       HG13_2      
 5    411     8   5.57 person_5       HG13_2      
 6    483     8   3.17 person_6       HG13_2      
 7    587     8   3.07 person_7       HG13_2      
 8    835     8   3.38 person_8       HG13_2      
 9    971     8   7.88 person_9       HG13_2      
10   1048     8   3.24 person_10      HG13_2      
11   9999     9   3.11 person_11      HG14_2

Running the code above gives me my desired output where matched cases have their values replaced for that cell only.

# A tibble: 11 × 3
   sidtp     HG13_2 HG14_2
   <chr>      <dbl>  <dbl>
 1 person_1     999      1
 2 person_2     999      1
 3 person_3     999      1
 4 person_4     999      1
 5 person_5     999      1
 6 person_6     999      1
 7 person_7     999      1
 8 person_8     999      1
 9 person_9     999      1
10 person_10    999      1
11 person_11      1      1

Here is what I want my code to look like

test_func3 <- function(xfile, item, data) {
  xfile <- xfile %>%
    filter((xfile$ZSCORE > z_high | xfile$ZSCORE < z_low) & 
             xfile$`ITEM LABEL` == item)
  
  data <- mutate(data, item = replace(item, sidtp 
                                        %in% xfile$`PERSON LABEL`, 999))
}

test3 <- test_func3(xfile, "HG13_2", data)

The only difference is that HG13_2 inside mutate and replace has been changed to item.

However, when I run this code it creates a new column with 999 for matched cells and NA for everything else.

# A tibble: 11 × 4
# A tibble: 11 × 4
   sidtp     HG13_2 HG14_2 item 
   <chr>      <dbl>  <dbl> <chr>
 1 person_1       1      1 999  
 2 person_2       1      1 999  
 3 person_3       1      1 999  
 4 person_4       1      1 999  
 5 person_5       1      1 999  
 6 person_6       1      1 999  
 7 person_7       1      1 999  
 8 person_8       1      1 999  
 9 person_9       1      1 999  
10 person_10      1      1 999  
11 person_11      1      1 NA  

I think my issue has something to do with data masking, or scoping, but can't figure it out.

Indeed it's data masking.

This should work (I changed two column names because of the spaces):

library(tidyverse)


data <- read.table(text = "sidtp     HG13_2 HG14_2
  1 person_1       1      1
2 person_2       1      1
3 person_3       1      1
4 person_4       1      1
5 person_5       1      1
6 person_6       1      1
7 person_7       1      1
8 person_8       1      1
9 person_9       1      1
10 person_10      1      1
11 person_11      1      1", header = TRUE)


xfile <- read.table(text = "PERSON  ITEM ZSCORE PERSON_LABEL ITEM_LABEL
 1     22     8   3.10 person_1       HG13_2      
 2    142     8   3.01 person_2       HG13_2      
 3    177     8   4.00 person_3       HG13_2      
 4    296     8   3.27 person_4       HG13_2      
 5    411     8   5.57 person_5       HG13_2      
 6    483     8   3.17 person_6       HG13_2      
 7    587     8   3.07 person_7       HG13_2      
 8    835     8   3.38 person_8       HG13_2      
 9    971     8   7.88 person_9       HG13_2      
10   1048     8   3.24 person_10      HG13_2      
11   9999     9   3.11 person_11      HG14_2", header = TRUE)




z_high <- 3
z_low <- -3


test_func3 <- function(xfile, item, data) {
  xfile <- xfile %>%
    filter((xfile$ZSCORE > z_high | xfile$ZSCORE < z_low) & 
             xfile$ITEM_LABEL == item)
  
  data <- mutate(data, {{item}} := replace(.data[[item]], sidtp 
                                      %in% xfile$PERSON_LABEL, 999))
}

test3 <- test_func3(xfile, "HG13_2", data)
test3
#>        sidtp HG13_2 HG14_2
#> 1   person_1    999      1
#> 2   person_2    999      1
#> 3   person_3    999      1
#> 4   person_4    999      1
#> 5   person_5    999      1
#> 6   person_6    999      1
#> 7   person_7    999      1
#> 8   person_8    999      1
#> 9   person_9    999      1
#> 10 person_10    999      1
#> 11 person_11      1      1

Created on 2022-05-25 by the reprex package (v2.0.1)

You have 2 separate problems: in replace(item, ...) you don't want to use a column named item, but the column of the data frame whose name is in the variable item. In base R you would use data[[item]] (without quotes), with data masking you get the same with the .data pronoun. To be honest I never know exactly whether to use this or {{, I tried both.

And you don't want to store the result in a new column named "item", but in the column whose name is in item, so you need the := construct, with a glue syntax (so, double-braces) on the left.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.