SimonG
November 17, 2022, 10:14pm
1
Hi,
I have a df like this
df<-data.frame(SYMBOL= c(rep("RET",4),rep("ROS",5),rep("ALK",3)),
region = c("Promoter1","Promoter2",'intronic',"intronic","NCR","intronic","Promoter1","intronic","NCR","intronic","Promoter1","Promoter2"),
value = sample(x=1:15,size=12))
I want a summarised dataframe with for each SYMBOL, the mean value of Promoter region (1 or 2) divided by the mean value of non-promoter region.
like
SYMBOL Value
RET X
ROS Y
ALK Z
Thank you
Simon
Is this what you are asking?
library(tidyverse)
#> Warning: package 'ggplot2' was built under R version 4.2.2
df<-data.frame(SYMBOL= c(rep("RET",4),rep("ROS",5),rep("ALK",3)),
region = c("Promoter1","Promoter2",'intronic',"intronic","NCR","intronic","Promoter1","intronic","NCR","intronic","Promoter1","Promoter2"),
value = sample(x=1:15,size=12))
df %>%
group_by(
SYMBOL,
region
) %>%
summarise(
mean(value)
)
#> `summarise()` has grouped output by 'SYMBOL'. You can override using the
#> `.groups` argument.
#> # A tibble: 9 × 3
#> # Groups: SYMBOL [3]
#> SYMBOL region `mean(value)`
#> <chr> <chr> <dbl>
#> 1 ALK intronic 8
#> 2 ALK Promoter1 9
#> 3 ALK Promoter2 12
#> 4 RET intronic 10
#> 5 RET Promoter1 5
#> 6 RET Promoter2 3
#> 7 ROS intronic 5.5
#> 8 ROS NCR 6.5
#> 9 ROS Promoter1 15
Created on 2022-11-17 with reprex v2.0.2
SimonG
November 18, 2022, 4:57pm
3
Hi,
this is the first step, but the final step that I dont succeed to reach is to have for each SYMBOL a value of the mean of all Promoter value (Promoter1 and Promoter2), divided my the mean of all non promoter (intronic and NCR). Final would be
SYMBOL RESULTS
ALK. mean(Promoter1,Promoter2)/mean(intronic,NCR)
etc...
Considering that in reality my df has much more values so it is necessary to use a grep("Promoter") and -grep("Promoter") to identify the 2 classes...
Thanks
In that case, I think this strategy will work:
library(tidyverse)
#> Warning: package 'ggplot2' was built under R version 4.2.2
df<-data.frame(SYMBOL= c(rep("RET",4),rep("ROS",5),rep("ALK",3)),
region = c("Promoter1","Promoter2",'intronic',"intronic","NCR","intronic","Promoter1","intronic","NCR","intronic","Promoter1","Promoter2"),
value = sample(x=1:15,size=12))
df %>%
mutate(
group = case_when(
str_detect(string = region, pattern = "Promoter") ~ "promoter",
# str_detect(string = region, pattern = "intronic") ~ "intronic",
# str_detect(string = region, pattern = "NCR") ~ "ncr",
TRUE ~ "other"
)
) %>%
group_by(
SYMBOL,
group
) %>%
summarise(
average = mean(value)
) %>%
ungroup() %>%
pivot_wider(
names_from = group,
values_from = average
) %>%
mutate(
symbol_mean = promoter / other
)
#> `summarise()` has grouped output by 'SYMBOL'. You can override using the
#> `.groups` argument.
#> # A tibble: 3 × 4
#> SYMBOL other promoter symbol_mean
#> <chr> <dbl> <dbl> <dbl>
#> 1 ALK 6 7.5 1.25
#> 2 RET 14 2.5 0.179
#> 3 ROS 7.75 1 0.129
Created on 2022-11-18 with reprex v2.0.2
SimonG
November 18, 2022, 6:42pm
5
rene_at_coco:
df %>%
mutate(
group = case_when(
str_detect(string = region, pattern = "Promoter") ~ "promoter",
# str_detect(string = region, pattern = "intronic") ~ "intronic",
# str_detect(string = region, pattern = "NCR") ~ "ncr",
TRUE ~ "other"
)
) %>%
group_by(
SYMBOL,
group
) %>%
summarise(
average = mean(value)
) %>%
ungroup() %>%
pivot_wider(
names_from = group,
values_from = average
) %>%
mutate(
symbol_mean = promoter / other
)
That's it, thanks a lot !
system
Closed
December 9, 2022, 6:42pm
6
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.