simple problem ifelse

Hi Rcommunity

got 2 dataframes:

df1<- structure(list(v1 = c("a", "b", "a"), v2 = c("b", "d", "e"),
rule = c("COE", "COE", "COE")), class = "data.frame", row.names = c(NA,
-3L))

df2 <- structure(list(v1 = c("a", "c", "g"), v2 = c("b", "e", "d")), class = "data.frame", row.names = c(NA,
-3L))

Im trying to match the rows in df1 and df2 , the code:

regeltest<- df1 %>% mutate(
rule = ifelse(v1 %in% df2$v1 & v2 %in% df2$v2, "COE",0),
lead = ifelse(v1 %in% df2$v1 & v2 %in% df2$v2, df1$rule, "ML"))

so when v1 and v2 match in both df1 and df2 it mutates COE and otherwise 0, same for lead

the correct answer should look like:
correct <- structure(list(v1 = c(2, 3, 2), v2 = c(3, 4, 5), rule = c("COE",
"0", "0"), lead = c("1", "ML", "ML")), class = "data.frame", row.names = c(NA,
-3L))

so when v1 and v2 match in both df1 and df2 it mutates COE and otherwise 0, same for lead

I am not sure I follow. Would you mind following up with additional clarity as to exactly what you're trying to do with this and ifelse statement or something else?.

The values in v1 and v2 in correct are integers, but appear at characters in df1 and df2. So it's hard to use that as a guide for the operations required to go from your initial two data frames to correct.

As nirgrahamuk mentioned in ifelse statement or something else?, set operations like those discussed in 13 Relational data | R for Data Science are a good way to approach this problem. For example, ``` r

library(dplyr)
df1<- structure(list(v1 = c("a", "b", "a"), v2 = c("b", "d", "e"),
                     rule = c("COE", "COE", "COE")), class = "data.frame", row.names = c(NA,
                                                                                         -3L))

df2 <- structure(list(v1 = c("a", "c", "g"), v2 = c("b", "e", "d")), class = "data.frame", row.names = c(NA,-3L))                                                                                                                                                                                 
#those in both
intersect(df1 %>% select(v1, v2), 
          df2 %>% select(v1, v2))
#>   v1 v2
#> 1  a  b

# in df1 but not in df2
setdiff(df1 %>% select(v1, v2), 
        df2 %>% select(v1, v2))
#>   v1 v2
#> 1  b  d
#> 2  a  e

# in df2 but not in df1
setdiff(df2 %>% select(v1, v2),
        df1 %>% select(v1, v2))
#>   v1 v2
#> 1  c  e
#> 2  g  d

Created on 2021-12-20 by the reprex package (v2.0.0)

Hi EconomiCurtis

So what i want is something that looks at each row in the columns v1 and v2 for df1 and df2. if they match then we add a column named "rule" and in that column we write "COE" (its a business rule). else we add 0 in the rule column.. (its the mutate func in the original post).

i used the ifelse condition and the & operator, in "regeltest" at the original post and got the result:
image

the 3rd row from df1 and df2 column v1 and v2 does not match so the correct table should be a equal to 0 at row 3 column "rule" and the column "lead" row 3 should be "ML"

I think what you're looking for is:

regeltest<- df1 %>%
   mutate(
     rule = ifelse(v1 == df2$v1 & v2 == df2$v2, "COE",0),
     lead = ifelse(v1 == df2$v1 & v2 == df2$v2, df1$rule, "ML"))

That is, you want to use == not %in%.

== is an operator meaning something like "exactly equal to"

Maybe running something like this can help illustrate the difference:

a <- c(1,3,5)

1 == a

1 %in% a

Hope that helps!

Luke

ps I personally would avoid doing this comparison across separate dataframes. Why not join or bind the data you need? It's much easier to verify the integrity and do any error checking, etc.

Hi Luke

one problem with that as I see it, it check for matches row wise:

df2 row 3 matches df1 row 2 but is not caught.

why bind the data - like row bind or ? how would u do it?

Br. Rasmus

Yes. Sorry. I thought this is what you were trying to do. Earlier you said:

Are you trying to match the row-wise combination of v1 and v2 to any row in the second dataframe?

For example, row 1 in df1 has v1 == "a" and v2 == "b" and the rule you want is based upon whether this combination exists in any row of df2?

Does that sound right?

"Are you trying to match the row-wise combination of v1 and v2 to any row in the second dataframe?"

Yes that is correct

Got it. In this case, I'd look at the various dplyr join functions.

You probably want either a left_join (which will keep alll rows in the left (first listed) dataframe) or a full_join (which will keep all the rows whether or not there are any matches). Check out the documentation on these functions. They are very helpful for this type of thing.

Try this:


full_join(df1,
          df2,
          by = c("v1","v2"))

This will find any cases where v1 and v2 both match between the two data frames. It will keep all values.

I'm not sure exactly of your further needs, but I think this is a good first step. You can then apply the rules you need, and you have the data together in one df.

(To stay as close to your original conception as possible, you could mutate() a new variable in each df using paste0. That would look something like the below. I'd definitely follow the join path outlined above though.

df1 <- df1 %>% 
  mutate(v1v2 = paste0(v1,v2))

df2 <- df2 %>% 
  mutate(v1v2 = paste0(v1,v2))

regeltest<- df1 %>% mutate(
  rule = ifelse(v1v2 %in% df2$v1v2, "COE",0),
  lead = ifelse(v1v2 %in% df2$v1v2, df1$rule, "ML"))

Thanks,

Luke

The intersect function works but i'm only getting the intersect as output, is it possible to combine intersect with ifelse statement?

regeltest<- df1 %>% mutate(
rule = ifelse(v1 %in% df2$v1 & v2 %in% df2$v2, "COE",0),
lead = ifelse(v1 %in% df2$v1 & v2 %in% df2$v2, df1$rule, "ML"))

so that if df1 and df2 intersect then it list the intersect and the non intersect and adds a new column named rule with"COE" if it intersects and else 0. And the same for lead so if it intersect it post Lead else 0?

jak213,

It's a bit tricky to write a good mutate() statement without knowing what the actual data might be like.

However, if you do the full join I suggested above, you would get:

v1 v2 rule
1  a  b  COE
2  b  d  COE
3  a  e  COE
4  c  e <NA>
5  g  d <NA>

You see the "NA" under rule because those rows were in df2, but not df1, and df1 does not have a "rule" column.

If the data were to be like this you could replace the NA with a 0, in a variety of ways. For example:

mutate(rule = ifelse(is.na(rule), "0", rule)

You could make a similar mutate() statement to generate your lead variable.

If I were doing this (but this is biased toward the work I do) I would add a marker to each dataframe, then join them. Something like this:

df1 <- tibble(v1 = c("a", "b", "a"),
              v2 = c("b", "d", "e"),
              rule = c("COE", "COE", "COE"),
              df1 = (TRUE))

df2 <- tibble(v1 = c("a", "c", "g"),
              v2 = c("b", "e", "d"),
              df2 = TRUE)

df <- full_join(df1,
                df2,
                by = c("v1","v2"))

(I'm also using tibble() to create the dataframe/tibble, since again, this is what I would do.)

Anyway, you get this:

  v1    v2    rule  df1   df2  
  <chr> <chr> <chr> <lgl> <lgl>
1 a     b     COE   TRUE  TRUE 
2 b     d     COE   TRUE  NA   
3 a     e     COE   TRUE  NA   
4 c     e     NA    NA    TRUE 
5 g     d     NA    NA    TRUE 

Then you can write your mutate functions along the lines of:

case_when(df1 == TRUE & df2 == TRUE ~ "Result you want when v1 and v2 combination is in both dataframes",
          df1 == TRUE & df2 == FALSE ~ "Result you want when v1 and v2 combination is ONLY in df1"
... 

I hope that helps.

Luke

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.