Hello everyone,
I'm new to R and I am trying to write a loop! (:
I have a data frame from a csv file that contains about 11 columns and hundred of rows. One of the columns ('Modifications), have the following text:
"1xTMT6plex [K18]; 1xTMT6plex [N-Term]; 1xPhospho [S1(99.9); S20(100)]"
I want to extract "1xPhospho [S1(99.9); S20(100)]" from column ("modification") to a new column (that I will call "Phospho"), then only keep the "1xPhospho [S1(99.9)]" part, and then create a new row ("Phosho_2") with "1xPhospho [S20(100)]" and all the information from the other columns would be copied from the original row.
In some cases, there might be needed more than 2 rows ("Phospho_3" and so on) because it could happen that there are more than one "S1(99.9)" type of information.
When this new rows are created, their ID (in the Accession column) should be the same as the original but with a _1..._n to the end. Finally, the original row containing the @1xPhospho [S1(99.9); S20(100)]" should be deleted.
I have the code for one specific case, but I would like to have it working for all the cases that are in the dataframe.
Here is the code I have now:
pmap[,"Phospho"] <- sub(".+([0-9]xPhospho [[:punct:]][A-Z0-9()/.; ]+[[:punct:]]).*", "\\1", pmap$modification)
tmp_df <- pmap[grep("[;]", pmap$Phospho),]
tmp_df$Phospho <- sub("([0-9]xPhospho [[:punct:]])[A-Z0-9()/.]+[;] (.+)", "\\1\\2", tmp_df$Phospho)
tmp_df$accession <- paste(tmp_df$accession, 1, sep = "_")
pmap <- rbind(pmap, tmp_df)
pmap[grep("[;]", pmap$Phospho),]
pmap[19, "Phospho"] <- sub("([0-9]xPhospho [[:punct:]])([A-Z0-9()/.]+)[;] .+", "\\1\\2]", pmap$Phospho[19])
Help!