Find "mean" by time and id

Hello,

I have got a huge dataset of several thousands of patients. My dataset consists of: patient id and values for a specific blodsample measured several times during 1 year. I have created columns for the periods I am interested in (0-3 months, 3-6 and 6-12 months). Every row in the created period-columns is indicated by "1" if the samplingdate is within the period (ie if the samplingdate of a value is within 0-3 months, then the column 0-3 gets "1", whereas the others columns gets "0" and so on for all value).

What I need to do is to find a mean of blodsample-values for every period for every patient, such that I end up with only 1 row per patient showing: patient id, and the mean value for every period (some periods will end up with mean "0" as there are no recordings for this period, which is ok.

Anybody has any idea how to do that?

Only the first 50 recordings are shown in the dput below...

structure(list(id=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50), value= c(10.1, 10.2, 10.5, 10.5, 10.5, 10.6, 10.7, 10.7, 10.8, 10.8, 10.9, 10.9, 10.1, 10.1, 10.1, 10.1, 10.2, 10.2, 10.3, 10.3, 10.3, 10.4, 10.5, 10.6, 11.2, 10.8, 7.2, 8.1, 8.3, 8.4, 8.6, 8.9, 9.3, 9.5, 9.5, 9.6, 9.6, 9.7, 9.7, 9.7, 9.8, 9.3, 9.6, 9.6, 9.8, 9.8, 10.1, 6.3, 6.6, 6.9), mo_3= c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0), mo_6= c (0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0),
mo_12= c(1,0,1,1,0,0,1,1,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,1,1,0,1,1,1,0,0,0,0,0,0,0,0,0)), sorted="id", class= c("data.table","data.frame"), row.names= c(NA, -50L), internal.selfref = <pointer: 0x0000000000121ef0>) 

Would appreciate any help :slight_smile:

Hi Nice!

It sounds like you might be able to do what you want if you make your data tidy (e.g., rows should contain unique observations and columns should contain variables) and the use dplyr::group_by followed by dplyr::summarise.

I think that at the moment your data is not tidy because of your different period columns; they should be values in a single column instead of being different columns/variables. You can combine them in such a column using tidyr::pivot_longer and then (assuming I understood your data correctly :sweat_smile:) you can use dplyr::group_by to group your data by id and time period, and then dplyr::summarise to get the mean values for each patient in each time period.

I can try to give you a working example if you provide your data as tibble/data frame (I didn't manage to load the data with the code you provided, sorry).

Cheers,

F

@francisko.rezende

Thank you very much!
I will try to do this :slight_smile:

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.