using argument passed to function for left join

Hello

I'm attempting this:

f_replace_codes(statcat = "police_force")   


f_replace_codes <- function(statcat) 
  
{

    df1 <- df2 %>% left_join(code_label_temp, by = c(statcat='code')) 

}

but I receive this error

Error: Join columns must be present in data.
x Problem with `statcat`.

the columns of df2 include "police_force" so it is not a spelling issue. I don't know why left_join isn't evaluating the argument as 'police_force' if that is the issue. Thanks

For left_join to work

df2 <- df1 %>% left_join(code_label_temp, by = c(statcat='code'))

It's not clear what the aim is, so I can't suggest how to join. See the FAQ: How to do a minimal reproducible example reprex for beginners.

Thanks for your reply and I am sorry that I was too brief:

df1:

eventID <- c(1, 2, 3, 4, 5, 6, 7, 8)
police_force <- c(1, 2, 1, 6, 4, 1, 6, 3)
year <- c(2014, 2014, 2009, 2020, 2015, 2016, 2014, 2015)
day_of_week <- c(7, 2, 2, 1, 5, 4, 5, 3)

df1 <- data.frame(eventID, police_force, year,day_of_week)

code_label

code <- c(1,2,3,4,5,6,1,2,3,4,5,6,7)
label <- c("Force_A", "Force_B","Force_C","Force_D","Force_E","Force_F", "Monday", "Tuesday","Wednesday","Thursday","Friday", "Saturday", "Sunday")
field_name <- c("police_force", "police_force", "police_force", "police_force", "police_force", "police_force", "dayofweek", "dayofweek", "dayofweek", "dayofweek", "dayofweek", "dayofweek", "dayofweek")

code_label <- data.frame(code,label,field_name)

code_label_temp - is code label filtered, in my example by field_name "police_force" (mentioned so that the reason for the function is clearer I hope):

code <- c(1,2,3,4,5,6)
label <- c("Force_A", "Force_B","Force_C","Force_D","Force_E","Force_F")
field_name <- c("police_force", "police_force", "police_force", "police_force", "police_force", "police_force")

code_label_temp <- data.frame(code,label,field_name)

My eventual aim is to loop through the columns in df1 and replace all the codes with labels, being new to functions I just wanted to get the join working before looking at loops. So:

df2 <- df1 %>% left_join(code_label_temp, by = c("police_force"="code"))

gives the result I want.

PROBLEM:
I want to pass the column name in df1 as an argument "statcat" to the function as follows:

f_replace_codes(statcat = "police_force")   

f_replace_codes <- function(statcat) 
  
{

    df1 <- df2 %>% left_join(code_label_temp, by = c(statcat='code')) 

}

but that gives the error:

Error: Join columns must be present in data.
x Problem with `statcat`

Still can't do a left join for the same reason as before.

Let's step back and focus not on how to do a particular operation but ask what the operation is in aid of. Every R problem can be thought of with advantage as the interaction of three objects— an existing object, x , a desired object,y , and a function, f, that will return a value of y given x as an argument. In other words, school algebra— f(x) = y. Any of the objects can be composites.

In this case x is df1 and y is a data frame substituting values of two columns that are represented with numeric codes with assigned character string codes. The reprex below provides two functions that do this.

The one for day_of_week takes advantage of a function, wday, that is designed specifically for the purpose of representing the integers 1:7 as names of days. At the expense of expressly matching the integers with the corresponding weekday names, it could be done the same way as the second function, using case_when.

Those functions operate on sub-objects in x and they are glued together with wrapping functions $ in the first case that simply replaces a column in place with the return value of the wday function. In the second case, we use mutate to wrap case_when.

suppressPackageStartupMessages({
  library(dplyr)
  library(lubridate)
})

eventID <- c(1, 2, 3, 4, 5, 6, 7, 8)
police_force <- c(1, 2, 1, 6, 4, 1, 6, 3)
year <- c(2014, 2014, 2009, 2020, 2015, 2016, 2014, 2015)
day_of_week <- c(7, 2, 2, 1, 5, 4, 5, 3)

df1 <- data.frame(eventID, police_force, year,day_of_week)

df1
#>   eventID police_force year day_of_week
#> 1       1            1 2014           7
#> 2       2            2 2014           2
#> 3       3            1 2009           2
#> 4       4            6 2020           1
#> 5       5            4 2015           5
#> 6       6            1 2016           4
#> 7       7            6 2014           5
#> 8       8            3 2015           3

# convert day_of_week to factor with word labels
df1$day_of_week <- wday(df1$day_of_week, label = TRUE, abbr = FALSE)

# convert police_force numeric identification to a string represention

df1 <- df1 %>% mutate(police_force = case_when(
  police_force == 1 ~ "Force A",
  police_force == 2 ~ "Force B",
  police_force == 3 ~ "Force C",
  police_force == 4 ~ "Force D",
  police_force == 5 ~ "Force E",
  police_force == 6 ~ "Force F")
)

df1
#>   eventID police_force year day_of_week
#> 1       1      Force A 2014    Saturday
#> 2       2      Force B 2014      Monday
#> 3       3      Force A 2009      Monday
#> 4       4      Force F 2020      Sunday
#> 5       5      Force D 2015    Thursday
#> 6       6      Force A 2016   Wednesday
#> 7       7      Force F 2014    Thursday
#> 8       8      Force C 2015     Tuesday

df1 <- data.frame(eventID, police_force, year,day_of_week)

# wrap it up in a function
recode_df <- function(x){
  x$day_of_week <- wday(x$day_of_week, label = TRUE, abbr = FALSE)
  x %>% mutate(police_force = case_when(
    police_force == 1 ~ "Force A",
    police_force == 2 ~ "Force B",
    police_force == 3 ~ "Force C",
    police_force == 4 ~ "Force D",
    police_force == 5 ~ "Force E",
    police_force == 6 ~ "Force F"))
}

recode_df(df1)
#>   eventID police_force year day_of_week
#> 1       1      Force A 2014    Saturday
#> 2       2      Force B 2014      Monday
#> 3       3      Force A 2009      Monday
#> 4       4      Force F 2020      Sunday
#> 5       5      Force D 2015    Thursday
#> 6       6      Force A 2016   Wednesday
#> 7       7      Force F 2014    Thursday
#> 8       8      Force C 2015     Tuesday

Hello
thank you for that solution, which I understand. But in the case that df1 has twenty-plus columns with code for which I want to show the label value and code_label has all the codes and labels that I will need to use, then it seemed to me that using case_when would basically be copying out the code_label table. And I was wondering if it was possible to do the same thing in fewer lines and without inviting typos. That was my 'why'. But then I was, for me, on the thin ice of 'how' in looking at whether I could achieve all those case_whens by using generic code in a function. I will certainly use the method you've chosen but that still leaves an unanswered question, ie, that as this produces a result:

df2 <- df1 %>% left_join(code_label_temp, by = c("police_force"="code"))

why does the following produce an error?

f_replace_codes(statcat = "police_force")   

f_replace_codes <- function(statcat) 
  
{

    df2 <- df1 %>% left_join(code_label_temp, by = c(statcat="code")) 

}

I've been reading about how verbs are evaluated inside functions, I'm guessing that may have something to do with it, but I have not yet achieved understanding.

Here's a solution that doesn't rely on case_when but does rely on being able to index the character representations of police_force by ascending integers.

suppressPackageStartupMessages({
  library(dplyr)
  library(lubridate)
})

eventID <- c(1, 2, 3, 4, 5, 6, 7, 8)
police_force <- c(1, 2, 1, 6, 4, 1, 6, 3)
year <- c(2014, 2014, 2009, 2020, 2015, 2016, 2014, 2015)
day_of_week <- c(7, 2, 2, 1, 5, 4, 5, 3)

df1 <- data.frame(eventID, police_force, year,day_of_week)
df1
#>   eventID police_force year day_of_week
#> 1       1            1 2014           7
#> 2       2            2 2014           2
#> 3       3            1 2009           2
#> 4       4            6 2020           1
#> 5       5            4 2015           5
#> 6       6            1 2016           4
#> 7       7            6 2014           5
#> 8       8            3 2015           3
force_codes <- paste0("Force_",LETTERS)

df2 <- df1 %>% mutate(police_force = force_codes[police_force])
df2
#>   eventID police_force year day_of_week
#> 1       1      Force_A 2014           7
#> 2       2      Force_B 2014           2
#> 3       3      Force_A 2009           2
#> 4       4      Force_F 2020           1
#> 5       5      Force_D 2015           5
#> 6       6      Force_A 2016           4
#> 7       7      Force_F 2014           5
#> 8       8      Force_C 2015           3

# create joiner dataframe
j <- data.frame(police_force = police_force, force_code = force_codes[police_force])

f_replace_codes <- function(x,y,z) inner_join(x,y, by = z, keep = FALSE) %>% distinct()

f_replace_codes(df1,j,"police_force")
#>   eventID police_force year day_of_week force_code
#> 1       1            1 2014           7    Force_A
#> 2       2            2 2014           2    Force_B
#> 3       3            1 2009           2    Force_A
#> 4       4            6 2020           1    Force_F
#> 5       5            4 2015           5    Force_D
#> 6       6            1 2016           4    Force_A
#> 7       7            6 2014           5    Force_F
#> 8       8            3 2015           3    Force_C

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.