I need so much help, it's not even funny.

Okay, so here I go. Forewarning, I'm probably not going to explain this in the most concise and effective way, so please bear with me.

I worked with a professor to write code in R Studio that will take a list in .txt format. This list was acquired from flybase.org, and it's basically just of genes their functions. This list was used to organize sample data from a mass spec analysis. The result was what I wanted; all the data I wanted from the mass spec, was organized into nice neat columns, and the output was .xlsx file.

Now, I want to convert all the data in column H from an "FBpp" ID to the correlated "FBgn" ID, which I downloaded a .txt format list of from flybase.org as well. I would also like to add another column with the matching gene symbol, which is also found in the .txt list of IDs.

Finally, I would like to remove specific genes related to muscle tissue from the final result. The reason is that the muscle is contamination of the relevant proteins in the mass spec sample that I care about. I haven't figured out a way to compile the necessary list of muscle genes I need removed from my results as of yet, but that's a problem for another day, in another forum.

A screenshot of my original code is posted below. As a new member I can only post 1 image. Any helpful thoughts/input?

!

Thanks in advance.

An image is actually not very useful. The best thing to post is a Reproducible Example (reprex). Here is an explanation of reprexes:

I suggest you show us some small data frames which have the crucial columns from your xlsx file and the .txt file that contains the mapping from FBpp ID to FBgn ID. If I understand your explanation, the examples only need to have a few rows. I am not sure if the gene symbol is in the same file as the ID mapping. If it is not, we will need an example of that also.

1 Like

Hi C_Lark,
Welcome to the RStudio Community.
It is worth the work to learn how to make a reprex with about 10 rows, and only the crucial columns.
It will really make a difference once you learn how to make a reprex.
You will be able to get a lot of help quickly.
Another resource for reprexing for beginners:

You may also be wondering how to make a tiny version of your dataset.
One way is:

`dataset %>% # take dataset, then

slice(1:10) %>% # keep the first 10 rows, then

dput() -> # writes out the data and assigns it to

df # the object df`

It often also helps to select only the important variables during this piped code.
It is likely that you can make the FBppID and FBgnID link you want with a dplyr: left_join, and the removal of muscle proteins with a dplyr::anti_join.