how to mach every entry of a list of one columns to multiple columns of another list

I have these data frames:

  1. brown_GO_terms
structure(list(`activation of innate immune response` = c("C1qbp", 
"Clec4e", "Hspd1", "Lgals9", "Nploc4", "Pik3ap1"), `adaptive immune response` = c("Alcam", 
"C1qbp", "C3", "Fcer1g", "Fcgr1", "Fzd5"), `adaptive immune response based on somatic recombination of immune receptors built from immunoglobulin superfamily domains` = c("C1qbp", 
"C3", "Fcer1g", "Fcgr1", "Fzd5", "H2-Ab1"), `cellular response to cytokine stimulus` = c("Actr2", 
"Adipor2", "Agfg1", "Bmi1", "Cacybp", "Casp4"), `cellular response to interferon-gamma` = c("Actr2", 
"Ccl3", "Gapdh", "Gbp7", "H2-Ab1", "Kif5b"), `cellular response to virus` = c("Ccl3", 
"Gapdh", "Gbp7", "H2-Ab1", "Kif5b", "Myc"), `cytokine production involved in immune response` = c("Ddx1", 
"Fcer1g", "Fzd5", "Hspd1", "Il10", "Il1b"), `cytokine secretion` = c("Casp4", 
"Ccl3", "Clec4e", "Fzd5", "Gapdh", "Hspd1"), `cytokine-mediated signaling pathway` = c("Adipor2", 
"Agfg1", "Casp4", "Ccl3", "Ccrl2", "Csf1"), `defense response` = c("Actr2", 
"Aoah", "Ap1g1", "Apod", "Becn1", "C1qbp"), `defense response to virus` = c("Becn1", 
"C1qbp", "Ddx1", "Il1b", "Nploc4", "Oas1a"), `I-kappaB kinase/NF-kappaB signaling` = c("Agfg1", 
"Ddx1", "Hacd3", "Il1b", "Mtdh", "Ripk2"), `immune system development` = c("Adrm1", 
"Agfg1", "Alg3", "Anxa2", "Bak1", "Bmi1"), `immunoglobulin production involved in immunoglobulin mediated immune response` = c("H2-Ab1", 
"Hspd1", "Nbn", "Rnf168", "Supt6", "Tnfsf4"), `innate immune response` = c("Actr2", 
"Ap1g1", "C1qbp", "C3", "Casp4", "Ccl3"), `isotype switching` = c("Hspd1", 
"Nbn", "Rnf168", "Supt6", "Tnfsf4", NA), `leukocyte activation` = c("Adrm1", 
"Alg3", "Ap1g1", "Bak1", "Bmi1", "Ccl3"), `leukocyte activation involved in immune response` = c("Alg3", 
"Ap1g1", "Ccl3", "Clec4e", "Fcer1g", "Grn"), `leukocyte apoptotic process` = c("Agfg1", 
"Bak1", "Fcer1g", "Il10", "Mif", "Myc"), `leukocyte differentiation` = c("Adrm1", 
"Agfg1", "Alg3", "Anxa2", "Bak1", "Bmi1"), `leukocyte mediated immunity` = c("Ap1g1", 
"C1qbp", "C3", "Ccl3", "Ddx1", "Fcer1g"), `lymphocyte activation` = c("Adrm1", 
"Alg3", "Ap1g1", "Bak1", "Bmi1", "Cdk6"), `lymphocyte activation involved in immune response` = c("Alg3", 
"Ap1g1", "Clec4e", "Fcer1g", "Hspd1", "Lgals9"), `negative regulation of cytokine production` = c("Apod", 
"C1qbp", "D1Ertd622e", "Fus", "Il10", "Lgals9"), `negative regulation of immune system process` = c("Apod", 
"C1qbp", "Ccl3", "Cdk6", "Cited2", "Dusp1"), `negative regulation of viral genome replication` = c("Oas1a", 
"Tnf", "Trim6", "Vapa", NA, NA), `negative regulation of viral life cycle` = c("Nbn", 
"Oas1a", "Tnf", "Trim6", "Vapa", NA), `negative regulation of viral process` = c("Ccl3", 
"Nbn", "Oas1a", "Tnf", "Trim6", "Vapa"), `pattern recognition receptor signaling pathway` = c("C1qbp", 
"Clec4e", "Hspd1", "Lgals9", "Nploc4", "Pik3ap1"), `positive regulation of adaptive immune response` = c("C3", 
"Fcer1g", "Fcgr1", "Fzd5", "H2-Ab1", "Hspd1"), `positive regulation of adaptive immune response based on somatic recombination of immune receptors built from immunoglobulin superfamily domains` = c("C3", 
"Fcer1g", "Fcgr1", "Fzd5", "H2-Ab1", "Hspd1"), `positive regulation of cytokine production` = c("Agfg1", 
"C1qbp", "C3", "Casp4", "Ccl3", "Clec4e"), `positive regulation of cytokine production involved in immune response` = c("Ddx1", 
"Fcer1g", "Fzd5", "Il1b", "Kars", "Mif"), `positive regulation of cytokine-mediated signaling pathway` = c("Agfg1", 
"Casp4", "Csf1", "Ripk2", "Trim6", NA), `positive regulation of defense response` = c("Ap1g1", 
"C1qbp", "C3", "Ccl3", "Clec4e", "Fcer1g"), `positive regulation of immune effector process` = c("Ap1g1", 
"C3", "Ddx1", "Fcer1g", "Fcgr1", "Fzd5"), `positive regulation of immune response` = c("Alg3", 
"Ap1g1", "C1qbp", "C3", "Clec4e", "Ddx1"), `positive regulation of immune system process` = c("Agfg1", 
"Alg3", "Ap1g1", "Bmi1", "C1qbp", "C3"), `positive regulation of immunoglobulin mediated immune response` = c("C3", 
"Fcer1g", "Fcgr1", "Pvr", "Tnf", "Tnfsf4"), `positive regulation of innate immune response` = c("Ap1g1", 
"C1qbp", "Clec4e", "Hspd1", "Lgals9", "Nploc4"), `positive regulation of interleukin-12 production` = c("Hspd1", 
"Tnfsf4", NA, NA, NA, NA), `positive regulation of interleukin-6 production` = c("Fcer1g", 
"Hspd1", "Il1b", "Lgals9", "Ncl", "Ripk2"), `positive regulation of leukocyte differentiation` = c("Agfg1", 
"Bmi1", "Ccl3", "Csf1", "Gfi1b", "H2-Aa"), `positive regulation of leukocyte mediated immunity` = c("Ap1g1", 
"C3", "Ddx1", "Fcer1g", "Fcgr1", "Fzd5"), `positive regulation of lymphocyte differentiation` = c("Bmi1", 
"H2-Aa", "Hsp90aa1", "Il1b", "Lgals9", "Nfkbiz"), `positive regulation of lymphocyte mediated immunity` = c("Ap1g1", 
"C3", "Fcer1g", "Fcgr1", "Fzd5", "Hspd1"), `positive regulation of production of molecular mediator of immune response` = c("Ddx1", 
"Fcer1g", "Fzd5", "Il1b", "Kars", "Mif"), `positive regulation of response to cytokine stimulus` = c("Agfg1", 
"Casp4", "Csf1", "Ripk2", "Trim6", NA), `positive regulation of T cell mediated immunity` = c("Fzd5", 
"Hspd1", "Il1b", "Pnp", "Pvr", "Tnfsf4"), `positive regulation of tumor necrosis factor superfamily cytokine production` = c("Agfg1", 
"Ccl3", "Fcer1g", "Fzd5", "Hspd1", "Lgals9"), `positive regulation of viral life cycle` = c("Anxa2", 
"Chmp4b", "Hacd3", "Larp1", "Ppia", "Ppid"), `positive regulation of viral process` = c("Anxa2", 
"Chd1", "Chmp4b", "Hacd3", "Larp1", "Pfn1"), `regulation of adaptive immune response` = c("C3", 
"Fcer1g", "Fcgr1", "Fzd5", "H2-Ab1", "Hspd1"), `regulation of adaptive immune response based on somatic recombination of immune receptors built from immunoglobulin superfamily domains` = c("C3", 
"Fcer1g", "Fcgr1", "Fzd5", "H2-Ab1", "Hspd1"), `regulation of cell adhesion` = c("Apod", 
"Bmi1", "C1qbp", "Calr", "Cdk6", "Cited2"), `regulation of cell-cell adhesion` = c("Bmi1", 
"Cited2", "H2-Aa", "H2-Ab1", "Hsp90aa1", "Hspa4"), `regulation of cytokine production` = c("Agfg1", 
"Apod", "C1qbp", "C3", "Casp4", "Ccl3"), `regulation of cytokine production involved in immune response` = c("Ddx1", 
"Fcer1g", "Fzd5", "Il10", "Il1b", "Kars"), `regulation of cytokine secretion` = c("Casp4", 
"Ccl3", "Clec4e", "Fzd5", "Gapdh", "Hspd1"), `regulation of defense response` = c("Aoah", 
"Ap1g1", "Apod", "C1qbp", "C3", "Ccl3"), `regulation of defense response to virus` = c("C1qbp", 
"Il1b", "Nploc4", "Trim6", NA, NA), `regulation of I-kappaB kinase/NF-kappaB signaling` = c("Agfg1", 
"Ddx1", "Il1b", "Mtdh", "Ripk2", "Sirt1"), `regulation of immune effector process` = c("Ap1g1", 
"C1qbp", "C3", "Ddx1", "Fcer1g", "Fcgr1"), `regulation of immune response` = c("Alg3", 
"Ap1g1", "C1qbp", "C3", "Clec4e", "Ddx1"), `regulation of immune system process` = c("Adrm1", 
"Agfg1", "Alg3", "Ap1g1", "Apod", "Bmi1"), `regulation of immunoglobulin mediated immune response` = c("C3", 
"Fcer1g", "Fcgr1", "Pvr", "Supt6", "Tnf"), `regulation of inflammatory response to antigenic stimulus` = c("C3", 
"Fcer1g", "Fcgr1", "Il10", "Psma1", "Psmb4"), `regulation of innate immune response` = c("Ap1g1", 
"C1qbp", "Clec4e", "Grn", "Hspd1", "Lgals9"), `regulation of interleukin-12 production` = c("C1qbp", 
"Hspd1", "Il10", "Tnfsf4", NA, NA), `regulation of leukocyte apoptotic process` = c("Fcer1g", 
"Il10", "Mif", "Myc", "Noc2l", "Nr4a3"), `regulation of leukocyte cell-cell adhesion` = c("Bmi1", 
"H2-Aa", "H2-Ab1", "Hsp90aa1", "Hspa4", "Hspd1"), `regulation of leukocyte mediated immunity` = c("Ap1g1", 
"C3", "Ddx1", "Fcer1g", "Fcgr1", "Fzd5"), `regulation of lymphocyte differentiation` = c("Adrm1", 
"Bmi1", "H2-Aa", "Hsp90aa1", "Il1b", "Lgals9"), `regulation of lymphocyte mediated immunity` = c("Ap1g1", 
"C3", "Fcer1g", "Fcgr1", "Fzd5", "Hspd1"), `regulation of production of molecular mediator of immune response` = c("Ddx1", 
"Fcer1g", "Fzd5", "Il10", "Il1b", "Kars"), `regulation of T cell differentiation` = c("Adrm1", 
"Bmi1", "H2-Aa", "Hsp90aa1", "Il1b", "Lgals9"), `regulation of T cell mediated cytotoxicity` = c("Pnp", 
"Pvr", NA, NA, NA, NA), `regulation of T cell mediated immunity` = c("Fzd5", 
"Hspd1", "Il1b", "Pnp", "Pvr", "Tnfsf4"), `regulation of viral genome replication` = c("Hacd3", 
"Larp1", "Oas1a", "Ppia", "Ppid", "Ppie"), `regulation of viral life cycle` = c("Anxa2", 
"Chmp4b", "Hacd3", "Larp1", "Nbn", "Oas1a"), `regulation of viral process` = c("Anxa2", 
"Ccl3", "Chd1", "Chmp4b", "Hacd3", "Larp1"), `regulation of viral transcription` = c("Ccl3", 
"Chd1", "Pfn1", NA, NA, NA), `response to cytokine` = c("Actr2", 
"Adipor2", "Agfg1", "Bmi1", "Cacybp", "Casp4"), `response to interferon-gamma` = c("Actr2", 
"Ccl3", "Gapdh", "Gbp7", "H2-Aa", "H2-Ab1"), `response to molecule of bacterial origin` = c("Ccl3", 
"Fzd5", "Hspd1", "Il10", "Il1b", "Lgals9"), `response to type I interferon` = c("Ptpn2", 
"Shmt2", "Trim6", NA, NA, NA), `response to virus` = c("Becn1", 
"C1qbp", "Cct5", "Cdk6", "Ddx1", "Gtf2f1"), `somatic diversification of immune receptors` = c("Hspd1", 
"Nbn", "Polb", "Rnf168", "Supt6", "Tnfsf4"), `somatic diversification of immune receptors via germline recombination within a single locus` = c("Hspd1", 
"Nbn", "Polb", "Rnf168", "Supt6", "Tnfsf4"), `somatic diversification of immunoglobulins` = c("Hspd1", 
"Nbn", "Polb", "Rnf168", "Supt6", "Tnfsf4"), `T cell activation` = c("Adrm1", 
"Alg3", "Bmi1", "Cdk6", "Clec4e", "Fcer1g"), `T cell differentiation` = c("Adrm1", 
"Bmi1", "Cdk6", "Clec4e", "Fcer1g", "Fzd5"), `T cell mediated immunity` = c("Fzd5", 
"Hspd1", "Il1b", "Pnp", "Pvr", "Tnfsf4"), `tumor necrosis factor superfamily cytokine production` = c("Agfg1", 
"Ccl3", "Fcer1g", "Fzd5", "Hspd1", "Il10"), `type I interferon production` = c("Hspd1", 
"Nploc4", NA, NA, NA, NA), `type I interferon signaling pathway` = c("Ptpn2", 
"Trim6", NA, NA, NA, NA), `viral gene expression` = c("Ccl3", 
"Chd1", "Denr", "Eif3d", "Pfn1", "Ssb"), `viral genome replication` = c("Hacd3", 
"Larp1", "Oas1a", "Pcbp1", "Ppia", "Ppid"), `viral life cycle` = c("Anxa2", 
"Chmp4b", "Hacd3", "Ist1", "Larp1", "Nbn"), `viral process` = c("Agfg1", 
"Anxa2", "Bak1", "Ccl3", "Chd1", "Chmp4b")), row.names = c(NA, 
6L), class = "data.frame")

> brown_GO_hub_genes

> structure(list(GeneSymbol = c("Srgn", "Pdia3", "Eif4g1", "Nr4a3", 
> "Calr", "Ppp1cb", "Hspa5", "Ccl3", "Myc", "Dusp1", "Ptges3", 
> "Psma4", "Grn", "Gapdh", "Srsf7", "Gbp7", "Cdk6", "Lgals9", "Tnf", 
> "Alcam", "Ppia", "Suz12", "Cdc34", "Socs3", "Ssb", "Rrs1", "Lgmn", 
> "Noc2l", "Hspa4", "Hk2", "Supt6", "Ncl", "Kif5b", "Pim1", "Stip1", 
> "Casp4", "Ggta1", "Eif6", "Il10")), row.names = c(NA, -39L), class = "data.frame")

I want to match every object from brown_GO_hub_genes with every column of brown_GO_terms . So, in the end, a matched list with the GeneSymbol column from brown_GO_hub_genes and next to it a column with per each row the colnames from brown_GO_terms that are associated with every GeneSymbol from brown_GO_hub_genes

So I would like to have something like this ( I don't know if the syntax below is correct to show you what I want, I hope so):

df_result <- structure(list(columns = (GeneSymbol, colnames_brown_GO_terms), rows = (entries_brown_GO_hub_genes, colnames_brown_GO_terms`))

The code I used so far:

brown_GO_terms <- as.data.frame(brown_GO_terms)
brown_GO_hub_genes <- as.data.frame(brown_GO_hub_genes)
brown_GO_terms_hub_genes <- match(brown_GO_hub_genes, brown_GO_terms, brown_GO_hub_genes %in% brown_GO_terms)
brown_GO_terms_hub_genes <- as.data.frame(brown_GO_terms_hub_genes)

The wrong result I got:

structure(list(brown_GO_terms_hub_genes = 0L), row.names = c(NA, 
-1L), class = "data.frame")

I hope someone can help me with this problem.

Thak you a lot!

Could you elaborate a little bit on your question? It is not clear what your desired output would be, also, there are some problems with your first dataframe, the column names have being read as a row.
Ideally, could you turn this into a minimal REPRoducible EXample (reprex)? A reprex makes it much easier for others to understand your issue and figure out how to help.

If you've never heard of a reprex before, you might want to start by reading this FAQ:

Hi @andresrcs, thank you so much for the reply. I modified a bit the post. Hope now is more clear.

Well, I think is clear enough but not a proper reprex yet, If you are going to continue making questions here I strongly recommend you to work on making proper reproducible examples, that would greatly increase your chances of getting help.

This is an example with a smaller subset of your dataframe, that shows how to do what you want.

library(tidyverse)

brown_GO_terms <- structure(
    list(
        `activation of innate immune response` = c("C1qbp", "Clec4e", "Hspd1", "Lgals9", "Nploc4", "Pik3ap1"),
        `adaptive immune response` = c("Alcam", "C1qbp", "C3", "Fcer1g", "Fcgr1", "Fzd5"),
        `adaptive immune response based on somatic recombination of immune receptors built from immunoglobulin superfamily domains` = c("C1qbp", "C3", "Fcer1g", "Fcgr1", "Fzd5", "H2-Ab1"),
        `cellular response to cytokine stimulus` = c("Actr2", "Adipor2", "Agfg1", "Bmi1", "Cacybp", "Casp4"),
        `cellular response to interferon-gamma` = c("Actr2", "Ccl3", "Gapdh", "Gbp7", "H2-Ab1", "Kif5b")
    ),
    row.names = c(NA, 6L),
    class = "data.frame"
)


brown_GO_hub_genes <- structure(
    list(
        GeneSymbol = c("Srgn", "Pdia3", "Eif4g1", "Nr4a3", 
                               "Calr", "Ppp1cb", "Hspa5", "Ccl3", "Myc", "Dusp1", "Ptges3", 
                               "Psma4", "Grn", "Gapdh", "Srsf7", "Gbp7", "Cdk6", "Lgals9", "Tnf", 
                               "Alcam", "Ppia", "Suz12", "Cdc34", "Socs3", "Ssb", "Rrs1", "Lgmn", 
                               "Noc2l", "Hspa4", "Hk2", "Supt6", "Ncl", "Kif5b", "Pim1", "Stip1", 
                               "Casp4", "Ggta1", "Eif6", "Il10")),
     row.names = c(NA, -39L),
     class = "data.frame"
)

brown_GO_terms %>% 
    gather(Term, GeneSymbol, everything()) %>%
    right_join(brown_GO_hub_genes) %>% 
    select(GeneSymbol, Term)
#> Joining, by = "GeneSymbol"
#>    GeneSymbol                                   Term
#> 1        Srgn                                   <NA>
#> 2       Pdia3                                   <NA>
#> 3      Eif4g1                                   <NA>
#> 4       Nr4a3                                   <NA>
#> 5        Calr                                   <NA>
#> 6      Ppp1cb                                   <NA>
#> 7       Hspa5                                   <NA>
#> 8        Ccl3  cellular response to interferon-gamma
#> 9         Myc                                   <NA>
#> 10      Dusp1                                   <NA>
#> 11     Ptges3                                   <NA>
#> 12      Psma4                                   <NA>
#> 13        Grn                                   <NA>
#> 14      Gapdh  cellular response to interferon-gamma
#> 15      Srsf7                                   <NA>
#> 16       Gbp7  cellular response to interferon-gamma
#> 17       Cdk6                                   <NA>
#> 18     Lgals9   activation of innate immune response
#> 19        Tnf                                   <NA>
#> 20      Alcam               adaptive immune response
#> 21       Ppia                                   <NA>
#> 22      Suz12                                   <NA>
#> 23      Cdc34                                   <NA>
#> 24      Socs3                                   <NA>
#> 25        Ssb                                   <NA>
#> 26       Rrs1                                   <NA>
#> 27       Lgmn                                   <NA>
#> 28      Noc2l                                   <NA>
#> 29      Hspa4                                   <NA>
#> 30        Hk2                                   <NA>
#> 31      Supt6                                   <NA>
#> 32        Ncl                                   <NA>
#> 33      Kif5b  cellular response to interferon-gamma
#> 34       Pim1                                   <NA>
#> 35      Stip1                                   <NA>
#> 36      Casp4 cellular response to cytokine stimulus
#> 37      Ggta1                                   <NA>
#> 38       Eif6                                   <NA>
#> 39       Il10                                   <NA>

Created on 2019-09-21 by the reprex package (v0.3.0.9000)

2 Likes

Wow thanks a lot @andresrcs. I red and tried to understand the reprex forkflow you posted before. I will manage to improve it. Could you tell me i.e. here what was not properly put in a reprex -way in the end?
I`m gonna try the code!

The problems with your reprex that I can see are:

  • Incorrect code formatting, making code hard to copy.
  • The sample data is bigger than needed, a reprex has to be minimal.
  • The example is not self-contained.

ok, sorry I am really a novice with coding. I am still "learning by doing" the correct language. I will use the Addin "Paste as tribble" next time. It should make my life way easier.
Thank you again for the help!

Sorry @andresrcs which function is %>%? Cause it gave me this problem:

Error in brown_GO_terms %>% gather(Term, GeneSymbol, everything()) %>% : could not find function "%>%"

Have you noticed that the example I gave you starts with library(tidyverse)? You have to load the libraries before using their functions.

Ok sorry, I noticed the that I had to use that package and I was thinking I had already installed it but it was not.

Ok it worked now :sweat_smile::wink:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.