Hi, welcome to the forum Here is a very quick and dirty first cut at some of what I think you want. However if you have more than 800.000 patients I sespect than someof the coding especially for the therapies variable will be more complicated than your example.
In any case am I even on the right path for what you want?
To cut down on typing I have renamed your data.frame and variables
dat1 -> df # df is an actual namee of a function
id -> ID
therapy -> therapies
number -> number of therapies
duration -> duration of therapie in min.
total.dur -> Total duration of inpatient treatment
dat1 <- data.frame(id = c("1","1","1","1","1","2","2","2","3","3"),
therapy = c("A51", "B32", "A67","A99","L37","A64","A51","L45","B32","A55"),
number = c(8,2,6,1,7,15,3,2,9,10), duration = c(240, 120, 189, 30, 210, 450, 60,60, 180,400),
total.dur = c(21,21,21,21,21,24,24,24,18,18))
library(tidyverse)
library("stringr")
dat1$alph <- str_sub(dat1$therapy, 1, 1) # extrart alpha part of "therapy"
dat1$num <- str_sub(dat1$therapy, - 2, - 1) # extrart alpha part of "therapy". Not needed at the moment
dat2 <- dat1 %>% group_by(alph, id)
dat2 %>% summarise(mean.duration = mean(duration), sd.duration = sd(duration),
mean.total.dur = mean(total.dur), sd.total.dur, n = n())