Using small dataframe but showing `Error: cannot allocate vector of size 49.8 Gb`

Hi,

I have 2 dataframe. The size of the df_1 is 1272464 bytes (1.2 Mb), 65893 objects, and 3 variables and df_2 is 3507976 bytes (3.7 Mb), 202732 objects, and 2 variables.

I am using a function to sort the columns of the dataframe and then merge these 2 dataframe. Code is given below

sc <- function(x, i) setNames(cbind(data.frame(t(apply(x[i], 1, sort))), x[-i]), names(x))
res <- merge(sc(df_1, 1:2), sc(df_2, 1:2))

The code is working properly for the demo/small dataframe. But my code is showing Error: cannot allocate vector of size 49.8 Gb

I am not sure, how these 2 small df making 94.8 Gb. Could you give me any suggestions, please?

What do your datasets look like? You could run: dput(head(df_1)) and dput(head(df_2)) and post your answer so that we can try it.

@gueyenono, the dataframe is huge even I have used the head function. However, the code is working properly for the demo/small dataframe. For example

df_1 <- read.table(text="query   target     weight
A1  A2  0.6
A2  A5  0.5
A3  A1  0.75
A4  A5  0.88
A5  A3  0.99
(+)-1(10),4-Cadinadiene     Falcarinone     0.09
Leucodelphinidin    (+)-1(10),4-Cadinadiene     0.876
Lignin  (2E,7R,11R)-2-Phyten-1-ol   0.778
(2E,7R,11R)-2-Phyten-1-ol   Leucodelphinidin    0.55
Falcarinone     Lignin  1
A1  (+)-1(10),4-Cadinadiene     1
A2  Lignin  1
A3  (2E,7R,11R)-2-Phyten-1-ol   1
Falcarinone  A6    1
A4  Leucodelphinidin    1
A4  Leucodelphinidin    1
Falcarinone  A100    1
A5  Falcarinone     1", header=TRUE)
df_2 <- read.table(text="query   target
A1  A2 
A2  A5
A1  A3  
A4  A5  
A3  A5  
(+)-1(10),4-Cadinadiene     Falcarinone    
Leucodelphinidin    (+)-1(10),4-Cadinadiene-100    
Lignin-2  (2E,7R,11R)-2-Phyten-1-ol   
A11  (+)-1(10),4-Cadinadiene    
A2  Lignin  
A3  (2E,7R,11R)-2-Phyten-1-0l 
Falcarinone  A6    
A4  Leucodelphinidin  ", header=TRUE)

But for the real dataframe code is showing the error.

I am attempting to find another way to get to your result but something seems odd to me in the way you wrote your function. For example, Leucodelphinidin is under the query column in df_2; however, after running your function: sc(df_2, 1:2), the entry becomes a target instead. I understand that this is due to you using the sort function on rows. Is that really what you want? Because this is basically changing the nature of your data.

@gueyenono thank you very much for catching the mistakes. No, I do not want to change the data from query to target or vice versa. That's means, I have to check the function again!

@akib62 In this case, you may want to use the left_join() or the inner_join() functions in dplyr.

library(dplyr)

left_join(df_1, df_2) 
inner_join(df_1, df_2)