intersect data in R

bioinfonext · January 14, 2023, 11:33am

Hi all,

I am trying to find common data in two files based on AlphaMarkerName column. Could you please help me how to get this and see those rows?

> genie = read.table(gzfile("EGFR_T1D_ALL_MIN_May2022.z_score.txt.gz"),sep="", header= TRUE)
> winker = read.table(gzfile("winker.updated.z_score.txt.gz"),sep="", header = TRUE)
> head(winker, 10) [,1:5]
    AlphaMarkerName       rsid Allele1 Allele2 Freq1.dm
1   10:10000018:A:G  rs6602381       A       G     0.59
2  10:100000625:A:G  rs7899632       A       G     0.56
3  10:100000645:A:C rs61875309       A       C     0.81
4  10:100003242:G:T rs12258651       T       G     0.84
5  10:100003304:A:G rs72828461       A       G     0.96
6  10:100003785:C:T  rs1359508       T       C     0.61
7  10:100004360:A:G  rs1048754       A       G     0.19
8  10:100004441:C:G  rs1048757       C       G     0.59
9  10:100004906:A:C  rs3750595       A       C     0.43
10 10:100004996:A:G  rs2025625       A       G     0.39

> head(genie, 10) [,12:17]
   HetISq HetChiSq HetDf HetPVal all_total AlphaMarkerName
1    47.7    1.911     1  0.1669      1768    1:748878:G:T
2     0.0    0.571     2  0.7515      2189  1:749963:T:TAA
3     0.0    1.101     2  0.5766      2189    1:751343:A:T
4     0.0    0.894     2  0.6397      2189   1:751488:G:GA
5     0.0    1.090     2  0.5799      2189    1:751756:C:T
6     0.0    1.068     2  0.5863      2189    1:752566:A:G
7     0.0    1.571     2  0.4559      2189    1:752721:A:G
8     0.0    0.915     2  0.6330      2189    1:752894:C:T
9     0.0    1.129     2  0.5687      2189    1:753405:A:C
10    0.0    1.176     2  0.5553      2189    1:753425:C:T
> common <- intersect(genie$AlphaMarkerName, winker$AlphaMarkerName)

Many thanks,

nirgrahamuk · January 14, 2023, 11:50am

What you say you want to do is called an inner join, dplyr has inner_join function for that

bioinfonext · January 14, 2023, 12:02pm

I use this but shows error;

common2 <- inner_join(genie, winker, by = AlphaMarkerName)

jrkrideau · January 14, 2023, 12:21pm

That's what you would think it would be but try this

# tidyverse
 dat1 <- winker %>% inner_join( genie, by= "AlphaMarkerName")

# base R
dat1 <- merge(winker, genie, by = "AlphaMarkerName")

bioinfonext · January 14, 2023, 12:28pm

Is there any way now to get *.y columns in different file and *.x column in another file from this data frame? But both data frame must have AlphaMarkerName column.

colnames(dat1)
 [1] "AlphaMarkerName" "rsid"            "Allele1.x"       "Allele2.x"      
 [5] "Freq1.dm"        "Effect.dm"       "StdErr.dm"       "P.value.dm"     
 [9] "n.dm"            "Freq1.x"         "Effect.x"        "StdErr.x"       
[13] "P.value.nodm"    "n.nodm"          "pdiff"           "pjoint"         
[17] "z_score.x"       "X1KG_ID"         "SNP_ID"          "CHR"            
[21] "POS"             "Allele1.y"       "Allele2.y"       "Freq1.y"        
[25] "Effect.y"        "StdErr.y"        "Pvalue"          "Direction"      
[29] "HetISq"          "HetChiSq"        "HetDf"           "HetPVal"        
[33] "all_total"       "z_score.y"

jrkrideau · January 14, 2023, 12:39pm

I think this does it

dat1  %>%  select(AlphaMarkerName ,ends_with(".x"))

system · January 21, 2023, 12:39pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.