Hi, FJCC, THANK YOU again!
However, the final results applying all coefficients were as follows:
A tibble: 6 x 2
Patient Age
1 Patient1 0.145
2 Patient2 3.08
3 Patient3 1.96
4 Patient4 2.63
5 Patient5 1.90
6 Patient6 2.84
I had done this formula before (via excel) based on what I read in tutorials on the internet and I got the same results. Which is strange because it is very different from age ... The ages (in months) that I posted in the last answer here are the original ages from the patients.
| Predict_Age_Pat1 |
Predict_Age_Pat2 |
Predict_Age_Pat3 |
Predict_Age_Pat4 |
Predict_Age_Pat5 |
Predict_Age_Pat6 |
| 52 |
165 |
146 |
97 |
146 |
81 |
See:
Coef <- read.table("coef.txt", stringsAsFactors = FALSE, header = T, dec = ",")
head(Coef)
CpGs Active.coefficients
1 (Intercept) 2.3768538
2 cg00343092 -1.1359977
3 cg00563932 -2.2549679
4 cg00571634 1.2051302
5 cg00629217 -1.4431147
6 cg01511567 0.5572547
Patients <- read.table("patients.txt", stringsAsFactors = FALSE, header = T)
head(Patients)
CpGs Patient1 Patient2 Patient3 Patient4 Patient5 Patient6
1 cg00343092 0.15721 0.01721 0.08350 0.09762 0.12237 0.06944
2 cg00563932 0.60688 0.61560 0.59548 0.58357 0.57340 0.55717
3 cg00571634 0.08351 0.12219 0.09979 0.13198 0.10580 0.10946
4 cg00629217 0.84616 0.88872 0.86079 0.86194 0.87492 0.83890
5 cg01511567 0.26409 0.20216 0.16384 0.18531 0.20060 0.20192
6 cg01515426 0.01682 0.01337 0.01863 0.00866 0.01611 0.02004
library(dplyr)
library(tidyr)
PatientsTall <- Patients %>% gather(key = Patient, value = Value, Patient1:Patient6)
head(PatientsTall);tail(PatientsTall)
CpGs Patient Value
1 cg00343092 Patient1 0.15721
2 cg00563932 Patient1 0.60688
3 cg00571634 Patient1 0.08351
4 cg00629217 Patient1 0.84616
5 cg01511567 Patient1 0.26409
6 cg01515426 Patient1 0.01682
CpGs Patient Value
661 cg25827666 Patient6 0.88086
662 cg26227465 Patient6 0.84195
663 cg26232558 Patient6 0.85450
664 cg26379475 Patient6 0.85061
665 cg27003571 Patient6 0.78404
666 cg27553955 Patient6 0.29984
PatientsTall <- inner_join(PatientsTall, Coef, by = "CpGs")
head(PatientsTall);tail(PatientsTall)
CpGs Patient Value Active.coefficients
1 cg00343092 Patient1 0.15721 -1.1359977
2 cg00563932 Patient1 0.60688 -2.2549679
3 cg00571634 Patient1 0.08351 1.2051302
4 cg00629217 Patient1 0.84616 -1.4431147
5 cg01511567 Patient1 0.26409 0.5572547
6 cg01515426 Patient1 0.01682 -4.7538892
CpGs Patient Value Active.coefficients
661 cg25827666 Patient6 0.88086 -1.9203303
662 cg26227465 Patient6 0.84195 0.3444122
663 cg26232558 Patient6 0.85450 0.3862605
664 cg26379475 Patient6 0.85061 -0.5471105
665 cg27003571 Patient6 0.78404 -0.6753860
666 cg27553955 Patient6 0.29984 1.2156377
PatientsTall <- PatientsTall %>% mutate(CalcValue = Value * Active.coefficients)
head(PatientsTall);tail(PatientsTall)
CpGs Patient Value Active.coefficients CalcValue
1 cg00343092 Patient1 0.15721 -1.1359977 -0.17859019
2 cg00563932 Patient1 0.60688 -2.2549679 -1.36849494
3 cg00571634 Patient1 0.08351 1.2051302 0.10064043
4 cg00629217 Patient1 0.84616 -1.4431147 -1.22110597
5 cg01511567 Patient1 0.26409 0.5572547 0.14716539
6 cg01515426 Patient1 0.01682 -4.7538892 -0.07996042
CpGs Patient Value Active.coefficients CalcValue
661 cg25827666 Patient6 0.88086 -1.9203303 -1.6915421
662 cg26227465 Patient6 0.84195 0.3444122 0.2899778
663 cg26232558 Patient6 0.85450 0.3862605 0.3300596
664 cg26379475 Patient6 0.85061 -0.5471105 -0.4653776
665 cg27003571 Patient6 0.78404 -0.6753860 -0.5295296
666 cg27553955 Patient6 0.29984 1.2156377 0.3644968
Ages <- PatientsTall %>% group_by(Patient) %>% summarize(Age = sum(CalcValue) + Coef[1,2])
Ages
A tibble: 6 x 2
Patient Age
1 Patient1 0.145
2 Patient2 3.08
3 Patient3 1.96
4 Patient4 2.63
5 Patient5 1.90
6 Patient6 2.84
In the methodology, the author comments that he transformed age before training the model, using the inverse F formula (see below), do these age values obtained by the coefficients need to be transformed? (I think not!)
I can't understand why it isn't working!

Based on the training set data, we found it advantageous to transform age using function F before building the prediction model. Using the inverse of function F, we transformed the linear part of the regression model into the DNA methylation age. Function F was as follows (toddler.age was set to 48 months):
F(age)=log(age+1)−log(toddler.age+1)ifage≤toddler.age
F(age)=(age−toddler.age)/(toddler.age+1)ifage>toddler.age
The child-specific biological age prediction1 model was established through sure independence screening combined with multivariate linear modeling based on the elastic net algorithm. First, we used sure independence screening (implemented in the R package ‘SIS’) [20] to reduce the dimensionality of the ~21,000 β values in the datasets. This step was taken because variable selection methods (e.g., lasso, LARS, SCAD) do not perform well when the dimension of the predictor variable p is much larger than the sample size n. Then, an elastic net regression model (implemented in the glmnet R function) [21] was used to regress a transformed model of age based on 111 β values in the training data. The elastic net approach is a combination of traditional lasso and ridge regression methods, emphasizing model sparsity while appropriately balancing the contributions of correlated variables. The glmnet function requires the user to specify two parameters (alpha and lambda). Since we used an elastic net predictor, alpha was set to 0.48, and lambda was set to 0.000954 based on 10-fold cross-validation of the training data (via the R function cv.glmnet).