pivot dataframe

Hello,
I always have issues with pivot_wider or pivot_longer
I have a data frame with mutation status and group:
data.frame(KRAS = c("M","NM","NM","M"),EGFR=c("M","NM","M","NM"),TP53=c("NM","M","M","NM"), group = c("1","2","3","1"))
I would like a final dataframe with the number of mutations for each group :

group.......KRAS M......KRAS NM.......EGFR M.........EGFR NM........TP53 M.........TP53 NM
1 ..................2.............. 0
2...................0..............1 etc...
3...................0...............1

in order to perform a khi square test of repartition.
Best.

Simon

Hi there,

The pivot functions are very powerful yet indeed sometimes tricky to wrap your head around. In your example however, you can't solve the issue by just using one pivot function, rather you want to carry out a summary across the columns per group. So that would translate to this

library(tidyverse)

myData = data.frame(
  KRAS = c("M", "NM", "NM", "M"),
  EGFR = c("M", "NM", "M", "NM"),
  TP53 = c("NM", "M", "M", "NM"),
  group = c("1", "2", "3", "1")
)

myData %>% group_by(group) %>% 
  summarise(across(everything(), function(x) sum(x == "M")))
#> # A tibble: 3 x 4
#>   group  KRAS  EGFR  TP53
#>   <chr> <int> <int> <int>
#> 1 1         2     1     0
#> 2 2         0     0     1
#> 3 3         0     1     1

Created on 2022-02-25 by the reprex package (v2.0.1)

By using the across() function we can summarise over all columns at once (by group) and get the numbers you need.

Hope this helps,
PJ

1 Like

That works indeed, Thanks !

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.