 # How can I calculate percent occurrence by row?

I have a data frame `df` which contains Sample names, the number of samples, and the cluster number. Example: There are 3 of Sample_A, and 2 of those samples are in cluster 12, the remaining one is in cluster 15 :

Sample Number_Samples Cluster
Sample_A 3 12
Sample_A 3 12
Sample_A 3 15
Sample_B 1 10
Sample_C 2 12
Sample_C 2 14
Sample_D 4 7
Sample_D 4 20
Sample_D 4 20
Sample_D 4 20

How can I add a column called Percent_Observed where I can get the value of what % each cluster represents for each sample type. For example, there is only 1 of Sample_B. Therefore, cluster 10 represents 100% of Sample_B.
I'm finding this a little tricky since the clusters are not unique. My goal is to have :

Sample Number_Samples Cluster Percent_Observed
Sample_A 3 12 66.66
Sample_A 3 12 66.66
Sample_A 3 15 33.33
Sample_B 1 10 100
Sample_C 2 12 50
Sample_C 2 14 50
Sample_D 4 7 25
Sample_D 4 20 75
Sample_D 4 20 75
Sample_D 4 20 75
``````df<- structure(list(Sample = c("Sample_A", "Sample_A", "Sample_A",
"Sample_B", "Sample_C", "Sample_C", "Sample_D", "Sample_D", "Sample_D",
"Sample_D"), Number_Samples = c(3L, 3L, 3L, 1L, 2L, 2L, 4L, 4L,
4L, 4L), Cluster = c(12L, 12L, 15L, 10L, 12L, 14L, 7L, 20L, 20L,
20L)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))

library(tidyverse)

group_by(df, Sample) %>%
mutate(rc = n()) %>%
group_by(Sample, Cluster) %>%
mutate(percent_observed = round(n() / rc, digits = 4) * 100)``````

Thank you! This worked beautifully!

``````dt=setDT(df)
str(dt)
dt[,`:=`(rc =.N),by=.(Sample)]
dt[,.(percent_observed = round(.N / rc, digits =3)*100 ),by=.(Sample,Number_Samples,Cluster)]
Sample Number_Samples Cluster percent_observed
1: Sample_A              3      12             66.7
2: Sample_A              3      12             66.7
3: Sample_A              3      15             33.3
4: Sample_B              1      10            100.0
5: Sample_C              2      12             50.0
6: Sample_C              2      14             50.0
7: Sample_D              4       7             25.0
8: Sample_D              4      20             75.0
9: Sample_D              4      20             75.0
10: Sample_D              4      20             75.0
``````

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.