"by" of the left_joint into dplyr

Hello,

I tried to use the left_join into my_function and is not possible to attribute two variable into by=c(...)
You will find hereafter the part of

my_funtion(tab,pt,origin,id,x,y){
  join<-tab %>%
          left_join(pt,by=c(origin=id)) %>%
          rename (Xi =x ,Yi = y) 
}
Erreur : `by` can't contain join column `origin` which is missing from LHS

Do you have an idea about this erreur?

Thanks in advance

This means there is no column named origin on the data.frame called tab. LHS means left-hand side.

The structure of tab is :

         i        j        var
1	T1	T1	291058
2	T1	T10	8297
3	T1	T11	3889
4	T1	T12	17064
5	T1	T2	12163

I want that the variable origin takes the value "i"

The structure of the pt is :

	 CODE                 X             Y
1	T1	2,3428828903353	48,8566258524875
...

I don't understand why I can't write in m_function ...by=c(origin=id)

The left_join() function does not create columns, it adds columns from data frame to another data frame. Here is an example with data similar to yours.

library(dplyr)

tab <- data.frame(i = rep("T1", 5),
                 j = c("T1", "T10", "T11", "T12", "T2"),
                 var = c(291058, 8297, 3889, 17064, 12163), stringsAsFactors = FALSE)
tab
#>    i   j    var
#> 1 T1  T1 291058
#> 2 T1 T10   8297
#> 3 T1 T11   3889
#> 4 T1 T12  17064
#> 5 T1  T2  12163
pt = data.frame(CODE = "T1", X = 2.34, Y = 48.86, stringsAsFactors = FALSE)
pt
#>   CODE    X     Y
#> 1   T1 2.34 48.86

join <- left_join(tab, pt, by = c("j" = "CODE"))
join
#>    i   j    var    X     Y
#> 1 T1  T1 291058 2.34 48.86
#> 2 T1 T10   8297   NA    NA
#> 3 T1 T11   3889   NA    NA
#> 4 T1 T12  17064   NA    NA
#> 5 T1  T2  12163   NA    NA

Created on 2019-12-29 by the reprex package (v0.3.0.9000)

Can you show how you want the data to look as a result of the function?

You will find the result that I need.
in the previous message I gave only one line from the pt file so your result is good.

         i          j     var          Xi            Yi
1	T1	T1	291058	2,3428828903353	48,8566258524875
2	T1	T10	8297	2,3428828903353	48,8566258524875
3	T1	T11	3889	2,3428828903353	48,8566258524875
4	T1	T12	17064	2,3428828903353	48,8566258524875

The left_join function works fine BUT
I created the function

my_function(tab,pt,origin,id,x,y) {
 join<-tab %>%
          left_join(pt,by=c(origin=id)) %>%
          rename (Xi =x ,Yi = y) }

When I call my function I declared the variables as follows:
join<-my_function(tb,pt, origin = "i", id = "CODE",x="X",y="Y")
That's when I get the error message.

I think that it's not possible to put two variable declared in a function for the left_join. What do you think about it?

So, I consider that's not possible to use left_join into the function (my) with the declaration of two variable in "by".

I used another method.
I can declare two variable : origin and id in my function.
These variables will be declared by the user of the function.

   tab = data.frame(tab, pt[match(tab[, origin], pt[, id]), 2:3])

I think this does what you want, though I think it is an ugly hack.

library(dplyr)
my_function <- function(tab, pt, origin, id, x, y) {
  byVec = c(id)
  names(byVec) <- origin
  tab %>%
    left_join(pt, by=byVec) %>%
    rename (Xi =x ,Yi = y) 
  }
DF1 <- data.frame(i = rep("T1", 5),
                  j = c("T1", "T10", "T11", "T12", "T2"),
                  var = c(291058, 8297, 3889, 17064, 12163), stringsAsFactors = FALSE)
DF2 = data.frame(CODE = "T1", X = 2.34, Y = 48.86, stringsAsFactors = FALSE)

my_function(DF1, DF2, origin = "i", id = "CODE", x = "X", y = "Y")
#>    i   j    var   Xi    Yi
#> 1 T1  T1 291058 2.34 48.86
#> 2 T1 T10   8297 2.34 48.86
#> 3 T1 T11   3889 2.34 48.86
#> 4 T1 T12  17064 2.34 48.86
#> 5 T1  T2  12163 2.34 48.86

Created on 2019-12-29 by the reprex package (v0.3.0.9000)

The issue is that c() doesn't play nicely with tidy evaluation so we just have to find another way to deliver the joining condition.

Since the by argument of the *_join functions actually takes a named character vector as the input, we can achieve this with setNames (or even set_names from purrr if you want to stay within the tidyverse).

library(dplyr)

DF1 <- data.frame(i = rep("T1", 5),
                  j = c("T1", "T10", "T11", "T12", "T2"),
                  var = c(291058, 8297, 3889, 17064, 12163), stringsAsFactors = FALSE)

DF2 <- data.frame(CODE = "T1", X = 2.34, Y = 48.86, stringsAsFactors = FALSE)

my_function <- function(tab, pt, origin, id) {
  tab %>% 
    left_join(pt, by = setNames(id, nm = origin)) %>% 
    rename(Xi = X, Yi = Y)
  }

my_function(DF1, DF2, origin = "i", id = "CODE")
   i   j    var   Xi    Yi
1 T1  T1 291058 2.34 48.86
2 T1 T10   8297 2.34 48.86
3 T1 T11   3889 2.34 48.86
4 T1 T12  17064 2.34 48.86
5 T1  T2  12163 2.34 48.86

Note: When using setNames or set_names, the order of the arguments is inverted from the way we normally pass them to by. So the y table's column needs to be specified first and the x table's second (as the argument to nm).

I can use dplyr it's a good.:grinning:
Thank you !
Sbl

Hi @siddharthprabhu,

I am just curious about naming arguments in your function.

my_function <- function(DF1, DF2, origin, id) {
  DF1 %>%
    left_join(DF2, by = setNames(id, nm = origin)) %>%
    rename(Xi = X, Yi = Y)
}

This gives the same result here but to me is more intuitive, because left_join() combines two data.frames.
I tried to figure out what do those two arguments mean: tab, pt ? Is it OK if I change it to DF1 and DF2 ?

regards.

@Andrzej I'm simply re-using the parameter names from the function given by the OP. You're right though that tab and pt are not very intuitive names and something like DF1, DF2 would be better.

Hi all. I agree with you.
Thanks
Sbl

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.