How to do multiply column value from 2 dataframe after doing multiple match in R

akib62 · February 17, 2021, 12:17am

I have 2 dataframes. I want to match the cell value det_id from the df_1 with the Id_1 or Id_2 of the df_2. If I get any match then I will multiply the multiplier of df_1 with the sim of df_2. Dataframe structure is given below

df_1

  id                                  det_id multiplier
1  1     FID00119 _ FSID012160 _ Riboflavine       0.20
2  2     FID00099 _ FSID012160 _ Riboflavine       0.30
3  3          FID00120 _ FSID009721 _ Lignin       0.50
4  4 FID00120 _ FSID012362 _ beta-Sitosterol       0.20
5  5       FID00038 _ FSID013505 _ Taraxerol       0.70
6  6 FID00087 _ FSID012362 _ beta-Sitosterol       0.90
7  7     FID00094 _ FSID013269 _ Cholesterol       0.05

df_2

                                        Id_1                                    Id_2  sim
1                                    ID00309                                 ID00314 0.55
2                                    ID00309                                 ID00315 0.66
3                                    ID00309     FID00119 _ FSID012160 _ Riboflavine 0.97
4                                    ID00309     FID00099 _ FSID012160 _ Riboflavine 0.66
5                                    ID00310          FID00120 _ FSID009721 _ Lignin 0.55
6                                    ID00311 FID00120 _ FSID012362 _ beta-Sitosterol 0.34
7                                    ID00312       FID00038 _ FSID013505 _ Taraxerol 0.44
8                                    ID00313 FID00087 _ FSID012362 _ beta-Sitosterol 0.55
9                                    ID00313     FID00094 _ FSID013269 _ Cholesterol 0.23
10         FID00038 _ FSID013505 _ Taraxerol                                 ID00910 0.00
11 FID00120 _ FSID001304 _ alpha1-Sitosterol        FID00017 _ FSID004090 _ Atropine 1.00
12   FID00087 _ FSID012362 _ beta-Sitosterol       FID00038 _ FSID013505 _ Taraxerol 0.78

I have written a code that can match only Id_1 or Id_2 separately, not at the same time!

df_2$new_ssp <- df_2$multiplier[match(df_1$Id_1, df_1$det_id)] * df_2[-c(1, 2)]

or

df_2$new_ssp <- df_2$multiplier[match(df_2$Id_2, df_1$det_id)] * df_2[-c(1, 2)]

So, how can I solve the problems?

I want to match both Id_1 or Id_2 in the same code. For example, if I use my second code then I am getting NA for 10, 11 number rows of df_2. But, I want to match ID_1 also.
If there is not match like the 1 and 2 number rows of the df_2 then I want to put the sim value in the new column new_ssp.
If there are multiple matches like the 12 number row of the df_2 then I want to take the maximum multiplier from the df_1 (like 0.90) to multiply with sim of df_2.

Any kind of suggestion is appreciable.

Reproducible Dataset

df_1

structure(list(id = 1:7, det_id = structure(c(5L, 4L, 6L, 7L, 
1L, 2L, 3L), .Label = c("FID00038 _ FSID013505 _ Taraxerol", 
"FID00087 _ FSID012362 _ beta-Sitosterol", "FID00094 _ FSID013269 _ Cholesterol", 
"FID00099 _ FSID012160 _ Riboflavine", "FID00119 _ FSID012160 _ Riboflavine", 
"FID00120 _ FSID009721 _ Lignin", "FID00120 _ FSID012362 _ beta-Sitosterol"
), class = "factor"), multiplyer = c(0.2, 0.3, 0.5, 0.2, 0.7, 
0.9, 0.05)), class = "data.frame", row.names = c(NA, -7L))

df_2

structure(list(Id_1 = structure(c(4L, 4L, 4L, 4L, 5L, 6L, 7L, 
8L, 8L, 1L, 3L, 2L), .Label = c("FID00038 _ FSID013505 _ Taraxerol", 
"FID00087 _ FSID012362 _ beta-Sitosterol", "FID00120 _ FSID001304 _ alpha1-Sitosterol", 
"ID00309", "ID00310", "ID00311", "ID00312", "ID00313"), class = "factor"), 
    Id_2 = structure(c(9L, 10L, 6L, 5L, 7L, 8L, 2L, 3L, 4L, 11L, 
    1L, 2L), .Label = c("FID00017 _ FSID004090 _ Atropine", "FID00038 _ FSID013505 _ Taraxerol", 
    "FID00087 _ FSID012362 _ beta-Sitosterol", "FID00094 _ FSID013269 _ Cholesterol", 
    "FID00099 _ FSID012160 _ Riboflavine", "FID00119 _ FSID012160 _ Riboflavine", 
    "FID00120 _ FSID009721 _ Lignin", "FID00120 _ FSID012362 _ beta-Sitosterol", 
    "ID00314", "ID00315", "ID00910"), class = "factor"), sim = c(0.55, 
    0.66, 0.97, 0.66, 0.55, 0.34, 0.44, 0.55, 0.23, 0, 1, 0.78
    ), new_ssp = structure(list(sim = c(NA, NA, 0.194, 0.198, 
    0.275, 0.068, 0.308, 0.495, 0.0115, NA, NA, 0.546)), class = "data.frame", row.names = c(NA, 
    -12L))), row.names = c(NA, -12L), class = "data.frame")

akib62 · February 17, 2021, 1:37am

Solution found from StackOverflow!

system · February 24, 2021, 1:37am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.