I have a table which looks like this but with many more entries:
ID Gene Tier Consequence
1314 ABC TIER1 missense
1314 PKD1 TIER1 frameshift
1314 PKD1 TIER1 stop_gain
6245 BJD TIER1 splice_site_variant
7631 PKD2 TIER1 missense
7631 PKD2 TIER1 non_coding
5336 PKD1 TIER3 missense
1399 PKD1 TIER2 non_coding
I would like to select one row pwer ID with the preference that Tier1 >tier2 >tier3 and then stop_gain>framshift>splice_site_variant>missense>non_coding_mutation. In reality there are roughly 10 types of "Consequence" in a hierarchical order.
I have subset the table already on tier1:
tier1 <- df[which(df$tier == 'TIER1),]
but now wanted to to subset on hierarchy of consequence to I get one line per ID with the highest consequence being selected.
Desired outcome:
ID Gene Tier Consequence
1314 PKD1 TIER1 stop_gain
6245 BJD TIER1 splice_site_variant
7631 PKD2 TIER1 missense
5336 PKD1 TIER3 missense
1399 PKD1 TIER2 non_coding
I thought about turning the consequences into numbers and then using that but was wondering if there was a way of doing this with the text. I work in a HPC in an airlock environment so solutions using base R would be preferable.
Many thanks for your time