from data frame to a score

Hi,
I have a data.frame of genic expression as follows.

Sample1 <- c(0.2,0.8,1.02,4,1.6)
Sample2 <- c(0.9,0.2,1.06,0.9,1.6)
Sample3 <- c(1.05,0.8,1.02,1.5,0.8)
input <- data.frame(Sample1,Sample2,Sample3 , row.names = c("geneA","geneB","geneC","geneD","geneE"))

I want to apply a score depending on the expression of each gene in each sample :
for gene A,B and C : <0.5 : 0 point ; >0,5 <2 : 2 points ; >2 : 6 points
for gene D, E : <0.5 : 10 points ; >0.5 <2 : 5 points ; >2 : 0 point

the result in that case would be as follows :

Sample1 <- c(0,2,2,0,5)
Sample2 < c(2,0,2,5,5)
Sample3 <- c(2,2,2,5,5)
result <- data.frame(Sample1,Sample2,Sample3 , row.names = c("geneA","geneB","geneC","geneD","geneE"))

Then I will sum each column to get the final score for each sample.

I dont manage in creating the function from input to result. Thanks for help !

Regards

Simon

I would write a function with all the if statements in it.

calc_score <- function(gene, ex) {
  if (gene == "A") {
    if (ex< 0.5) {
      score <- 0
    } else { 
      ...
    }
  } else if (gene == "B") {
   ...
  }
  return(score)
}

Then pivot your data to long format so you have columns for sample ID, gene ("A", "B", "C", ...) and expression (numeric). And

df2<- df %>%
 mutate(score = map2(gene, ex, calc_score) %>% 
 group_by(id) %>%
 summarize(total_score = sum(score)) %>%
 ungroup()

I suggest using the case_when() function from dpylr and making a new Type column to simplify the logical structure.

Sample1 <- c(0.2,0.8,1.02,4,1.6)
Sample2 <- c(0.9,0.2,1.06,0.9,1.6)
Sample3 <- c(1.05,0.8,1.02,1.5,0.8)
Gene <- c("geneA","geneB","geneC","geneD","geneE")
input <- data.frame(Gene,Sample1,Sample2,Sample3)
library(dplyr)
library(tidyr)
input <- input %>% mutate(Type=case_when(
  Gene %in% c("geneA", "geneB", "geneC") ~ 1,
  TRUE ~ 2
))
inputLong <- input %>% pivot_longer(Sample1:Sample3, 
                                    names_to = "Sample", 
                                    values_to = "Value")
inputLong <- inputLong %>% mutate(Score = case_when(
  Type == 1 & Value < 0.5 ~ 0,
  Type == 1 & Value < 2 ~ 2,
  Type == 1 & Value >= 2 ~ 6,
  Type == 2 & Value < 0.5 ~ 10,
  Type == 2 & Value < 2 ~ 5,
  Type == 2 & Value >= 2 ~ 0
))
FinalDF <- inputLong %>% select(Gene, Sample, Score) %>% 
  pivot_wider(names_from = Sample, values_from = Score)
FinalDF
#> # A tibble: 5 x 4
#>   Gene  Sample1 Sample2 Sample3
#>   <chr>   <dbl>   <dbl>   <dbl>
#> 1 geneA       0       2       2
#> 2 geneB       2       0       2
#> 3 geneC       2       2       2
#> 4 geneD       0       5       5
#> 5 geneE       5       5       5

Created on 2021-06-28 by the reprex package (v0.3.0)

1 Like

That's perfect, thanks to both of you !

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.