Hello,
I'm building a package for the first time and struggling with a problem that I believe is related to environments. In my package, I have a function that uses two data frames as in this example (this isn't the actual function, but should make the point):
my_function <- function() {
temp <- df1 %>%
dplyr::left_join(df2, by = c("id", "id"))
return(temp)
}
I have included the data frames as "internal data" in the package, stored in R/sysdata.rda
. The function works fine. I can install the package and attach it in a new R session with library()
and it does what it should.
However, in my workflow, I will be changing df1 and df2. The versions of these I put into the package are for testing the functions, but in reality I want them to work on versions of these objects that are going to grow over time. If, in my R session, I create another variable called "df1" (that has more rows than the version of df1 that is in my package) in the Global environment and run the function, it seems to use the package version of df1 and not the one in the Global environment.
Please can anyone suggest what the best practice would be to overcome this? Is it foolish to have this data in my package at all? My thought was to somehow override which version the function uses...to somehow include a step where it will search for the version in the Global environment and use that if it exists, and defer to the version in the package if it can't find it in the Global environment.
Thanks very much!