Getting R to talk to BioMart

database
rstudio

#1

My recent course on "Data Carpentry" in Cambridge did briefly cover the topic of integration of R with databases, for example SQL databases. The problem was that this was covered towards the end, very briefly, and I am none the wiser. I think the whole tutorial took 5 minutes.

My objective is to annotate a table of gene expression data "say for organism A" with information on orthologues from "organism B", as an additional column on a table. I watched tutorials online (YouTube) on how do do this using the BioMart GUI. Many clicks involved...

Is there a way to achieve the same think from R by pulling data from BioMart?

Thanks in advance for your help.

Miguel


#2

Can you phrase this in more general "database" terms? Is this a matter of joining data from one table with another based on a common key?

You also might want to check out the Databases Using R from RStudio page, which has a good overview, as well as more specific guidance.

There's also a webinar, if you prefer video:

https://www.rstudio.com/resources/videos/best-practices-for-working-with-databases-webinar/


#3

Hi @mara, thanks for your help. Yes, it is indeed joining my local data with what I assume is the orhtologue mapping table present in BioMart. My question is specific to BioMart but I will check your suggested references.


#4

Oh, I didn't realise it was a database. There is an R package specifically for interfacing with BioMart I found when googling it:


#5

Very useful @mara. I installed the package and looked at the users guide:

http://127.0.0.1:19184/library/biomaRt/doc/biomaRt.html#how-to-build-a-biomart-query

It mentions that "BioMart databases can contain several datasets, for Ensembl every species is a different dataset". I just wonder if their biomaRt query allows orthologue mapping (inter-species queries) that is what the web interface can do.


#6

Glad it's helpful! True confession: I have no clue what this means.

That's not to say no one here will, but you might find more domain-related help in the bioconductor forum for this one:
https://support.bioconductor.org/


#7

Sorry for not being clear @mara, orthologue mapping means pulling the homologue of a gene in species A from species B, so you end up with two genes that perform the same function in both species. Say for example the gene "Ref2P" in flies is called P62 in humans.

I followed your advice and also posted a question in the bioconductor forum.


#8

Would you mind posting a link to your question in the Bioconductor support forums? That way, anybody who finds this discussion in the future can follow the thread.


#9

Of course, the link to the question: https://support.bioconductor.org/p/113863/