I have a data frame `df`

which contains Sample names, the number of samples, and the cluster number. Example: There are 3 of Sample_A, and 2 of those samples are in cluster 12, the remaining one is in cluster 15 :

Sample | Number_Samples | Cluster |
---|---|---|

Sample_A | 3 | 12 |

Sample_A | 3 | 12 |

Sample_A | 3 | 15 |

Sample_B | 1 | 10 |

Sample_C | 2 | 12 |

Sample_C | 2 | 14 |

Sample_D | 4 | 7 |

Sample_D | 4 | 20 |

Sample_D | 4 | 20 |

Sample_D | 4 | 20 |

How can I add a column called **Percent_Observed** where I can get the value of what % each cluster represents for each sample type. For example, there is only 1 of Sample_B. Therefore, cluster 10 represents 100% of Sample_B.

I'm finding this a little tricky since the clusters are not unique. My goal is to have :

Sample | Number_Samples | Cluster | Percent_Observed |
---|---|---|---|

Sample_A | 3 | 12 | 66.66 |

Sample_A | 3 | 12 | 66.66 |

Sample_A | 3 | 15 | 33.33 |

Sample_B | 1 | 10 | 100 |

Sample_C | 2 | 12 | 50 |

Sample_C | 2 | 14 | 50 |

Sample_D | 4 | 7 | 25 |

Sample_D | 4 | 20 | 75 |

Sample_D | 4 | 20 | 75 |

Sample_D | 4 | 20 | 75 |