Problem with multiple regression coefficients

Hi! I am having trouble working with a regression with so many regression coefficients. I'm trying to get the estimated value from using the new data column values. I am trying to do the formulas manually and it is not working, is there any formula or tool in R for this analysis?

ID Active.coefficients New.data
(Intercept) 2,376853787
AA1 -1,135997653 0.854
AA2 -2,254967927 0.254
AA3 1,205130235 0.542
AA4 -1,443114744 0.811
AA5 0,5572547 0.516
AA6 -4,753889226 0.723
AA7 -2,975525682 0.290
AA8 -0,877083763 0.844
AA9 -1,101457403 0.023
AA10 0,534293486 0.276

The predict function can be used to generated predicted values from a new data set.

(I notice that in the image you posted the Active.coefficients use a comma as the decimal separator and the New.data uses a period. Is that causing you problems?)

DF <- data.frame(AA1 = runif(10), AA2 = runif(10), AA3 = runif(10))
library(dplyr)

DF <- mutate(DF, Y = AA1 * 1.3 - AA2 * 0.3 + AA3 *0.4 )
DF$Y <- DF$Y + rnorm(10)
FIT <- lm(Y ~ AA1 + AA2 + AA3, data = DF)
summary(FIT)
#> 
#> Call:
#> lm(formula = Y ~ AA1 + AA2 + AA3, data = DF)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -1.0021 -0.4957 -0.2958  0.4881  1.4530 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)  
#> (Intercept)  -0.2921     0.7636  -0.383   0.7152  
#> AA1           2.8786     0.8327   3.457   0.0135 *
#> AA2          -0.5949     0.9310  -0.639   0.5465  
#> AA3          -0.7275     0.9049  -0.804   0.4521  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.9245 on 6 degrees of freedom
#> Multiple R-squared:  0.6663, Adjusted R-squared:  0.4995 
#> F-statistic: 3.994 on 3 and 6 DF,  p-value: 0.07029

NewData <- data.frame(AA1 = 0.5, AA2 = 0.3, AA3 = 0.8)
predict(FIT, newdata = NewData)
#>         1 
#> 0.3867466

Created on 2020-04-02 by the reprex package (v0.2.1)

Hi! FJCC, thanks for the answer!! It is not a problem with the separator because this column I import with read.table ("file", sep = ",").
I tried to applied your model but I think my explanation was confused.

I am basing myself on a work (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6914436/) that created an age prediction model based on combined molecular characteristics (111 characteristics in total). This work provides a list of the regression coefficient for each of these characteristics. I am trying to apply these coefficients to the quantified value of these characteristics in order to obtain the predicted age of the individual. I have been trying to apply these coefficients for weeks but my background is not mathematical or computer science, and I have tried everything in my view....

The data would be like:

The regression coefficients from the original work are;

CpGs Active.coefficients
(Intercept) 2,376853787
cg00343092 -1,135997653
cg00563932 -2,254967927
cg00571634 1,205130235
cg00629217 -1,443114744
cg01511567 0,5572547
cg01515426 -4,753889226
cg01756060 -2,975525682
cg01899253 -0,877083763
cg02385474 -1,101457403
cg02489552 0,534293486
cg02626929 0,128916398
cg02789485 0,353159992
cg03224418 -0,165348646
cg03340261 -6,689713985
cg03970609 0,563336196
cg04458548 -0,283434273
cg04460372 -0,967940889
cg04474832 -1,203668298
cg04527989 0,677609524
cg04784672 1,786235879
cg05073035 1,216035058
cg05228408 -0,609085835
cg05294455 0,910369194
cg05352668 -1,571369198
cg05921699 1,800240774
cg05995267 0,600433301
cg06204948 0,639126779
cg06495803 0,396327381
cg07408456 -2,025444379
cg08032971 0,688587081
cg08355340 7,896033939
cg08418332 1,173668556
cg08675585 -1,519238797
cg08711674 1,119106843
cg08724636 -6,996531515
cg08733315 -6,468543656
cg08872742 0,940541228
cg09013626 0,613837924
cg09600829 -0,732377239
cg09809672 -0,614857266
cg09816471 1,516437238
cg10052840 -0,923536683
cg10834677 0,797599347
cg10986043 -2,379443118
cg11267879 -0,554874255
cg11314292 -3,38499408
cg11635563 -0,586380758
cg12024906 1,17757674
cg12467090 1,860965096
cg12688670 0,186533062
cg12810837 -1,305939966
cg13018903 0,152892228
cg13269407 -0,157006414
cg13436343 3,64661945
cg13590277 1,591073245
cg13870494 0,189212303
cg13946500 -3,850796145
cg14093936 -0,902373468
cg14271400 5,710688531
cg14456683 2,071567414
cg14494596 1,371768443
cg14839898 1,815352105
cg15051063 1,366658344
cg15238200 2,004655666
cg15297650 -1,589846694
cg15416233 -2,474569514
cg15457899 0,461285026
cg15700197 -0,378835203
cg16466334 -0,519673323
cg16713808 2,149119757
cg16744741 -0,628229923
cg17397493 0,501381132
cg17421623 0,555595016
cg17536532 1,437845258
cg17907567 0,722904509
cg18081104 0,414502763
cg18644543 0,931062722
cg18804206 1,561205825
cg19162158 3,204798604
cg19654437 -0,721912821
cg19722847 -1,328174578
cg19761273 -1,792864828
cg20025656 1,883236071
cg20349377 -1,099138807
cg20654468 0,335556248
cg21057046 -1,746170745
cg21959619 -0,108227218
cg22705225 -1,211995542
cg22724153 -0,712185236
cg22747092 -0,155437965
cg22884082 0,62832282
cg22919728 -0,292299689
cg23320649 -0,87213435
cg23587449 -5,214476293
cg23668631 -0,54580146
cg24001070 1,273455384
cg24433189 0,475029669
cg24453664 1,335584886
cg24471894 -0,519909509
cg24949049 0,75191663
cg25152942 0,511762568
cg25398949 3,475324556
cg25538571 -0,430581867
cg25762706 1,555819249
cg25809905 -1,731180008
cg25827666 -1,920330281
cg26227465 0,34441217
cg26232558 0,386260531
cg26379475 -0,547110473
cg27003571 -0,675385959
cg27553955 1,215637677

And the data of the individuals would be:

CpGs Patient1 Patient2 Patient3 Patient4 Patient5 Patient6
cg00343092 0.15721 0.01721 0.0835 0.09762 0.12237 0.06944
cg00563932 0.60688 0.6156 0.59548 0.58357 0.5734 0.55717
cg00571634 0.08351 0.12219 0.09979 0.13198 0.1058 0.10946
cg00629217 0.84616 0.88872 0.86079 0.86194 0.87492 0.8389
cg01511567 0.26409 0.20216 0.16384 0.18531 0.2006 0.20192
cg01515426 0.01682 0.01337 0.01863 0.00866 0.01611 0.02004
cg01756060 0.01456 0.01345 0.01574 0.01668 0.01434 0.01526
cg01899253 0.34035 0.37414 0.37194 0.28808 0.28419 0.35136
cg02385474 0.83976 0.76332 0.84002 0.7836 0.82859 0.80264
cg02489552 0.37446 0.51986 0.43513 0.48885 0.44473 0.42185
cg02626929 0.20293 0.17587 0.17848 0.28723 0.16282 0.14623
cg02789485 0.36728 0.27761 0.24908 0.32464 0.27244 0.25471
cg03224418 0.51308 0.35162 0.39172 0.41432 0.39993 0.43563
cg03340261 0.02264 0.01599 0.01705 0.0138 0.01877 0.01718
cg03970609 0.72254 0.77394 0.69285 0.68143 0.70075 0.73901
cg04458548 0.87895 0.81816 0.77241 0.82116 0.88704 0.87619
cg04460372 0.18746 0.18652 0.20367 0.19732 0.18399 0.15137
cg04474832 0.34517 0.31593 0.30893 0.30676 0.32518 0.29056
cg04527989 0.71933 0.71528 0.77079 0.68082 0.84132 0.75996
cg04784672 0.03634 0.04044 0.02943 0.03627 0.0392 0.03284
cg05073035 0.08553 0.091 0.08716 0.11191 0.07434 0.10127
cg05228408 0.45748 0.43421 0.45368 0.47118 0.45669 0.44544
cg05294455 0.67084 0.62494 0.68739 0.62771 0.73867 0.73212
cg05352668 0.9087 0.85648 0.91095 0.8832 0.91448 0.91564
cg05921699 0.84026 0.8221 0.85909 0.82634 0.82772 0.86858
cg05995267 0.20053 0.18206 0.18958 0.17405 0.13563 0.17896
cg06204948 0.06649 0.06296 0.04071 0.06928 0.08785 0.04238
cg06495803 0.41438 0.50091 0.35867 0.45391 0.4396 0.47239
cg07408456 0.65063 0.48084 0.54465 0.55821 0.5203 0.4551
cg08032971 0.79365 0.8364 0.75933 0.78352 0.77349 0.7768
cg08355340 0.01597 0.02229 0.01436 0.01929 0.01692 0.01054
cg08418332 0.71415 0.69801 0.70208 0.73758 0.63011 0.69004
cg08675585 0.80022 0.75221 0.82609 0.79018 0.82477 0.81339
cg08711674 0.7634 0.74825 0.77294 0.74399 0.71052 0.78525
cg08724636 0.01245 0.00647 0.01477 0.01575 0.02192 0.00981
cg08733315 0.02088 0.01773 0.02546 0.0208 0.01801 0.01907
cg08872742 0.39669 0.29745 0.35466 0.33859 0.3553 0.32622
cg09013626 0.0528 0.0509 0.05463 0.0509 0.04223 0.0491
cg09600829 0.03388 0.03173 0.04444 0.0385 0.04698 0.03841
cg09809672 0.72143 0.6037 0.50056 0.55913 0.68374 0.55818
cg09816471 0.19934 0.23309 0.19529 0.27122 0.21101 0.23471
cg10052840 0.86945 0.78551 0.8597 0.89295 0.78624 0.83486
cg10834677 0.79854 0.78085 0.82319 0.80509 0.79179 0.81516
cg10986043 0.77062 0.70717 0.75222 0.67219 0.71146 0.76841
cg11267879 0.87045 0.82953 0.84099 0.82397 0.85228 0.84037
cg11314292 0.03309 0.03382 0.02648 0.04337 0.0356 0.03561
cg11635563 0.83155 0.84839 0.86401 0.86138 0.82716 0.85118
cg12024906 0.12843 0.20284 0.29331 0.16172 0.36719 0.30311
cg12467090 0.72175 0.79263 0.72342 0.77682 0.75492 0.77514
cg12688670 0.28446 0.20875 0.25606 0.22768 0.26597 0.21723
cg12810837 0.37557 0.36258 0.41961 0.28978 0.44128 0.45181
cg13018903 0.84728 0.85588 0.83968 0.87382 0.83659 0.85778
cg13269407 0.28306 0.27002 0.24351 0.1698 0.17548 0.2675
cg13436343 0.01063 0.00764 0.01074 0.01118 0.01093 0.01226
cg13590277 0.57157 0.56132 0.5699 0.60539 0.56594 0.61106
cg13870494 0.7386 0.76942 0.67896 0.70996 0.66272 0.74398
cg13946500 0.06106 0.03825 0.04108 0.04938 0.04514 0.04567
cg14093936 0.43508 0.40067 0.40306 0.39938 0.38223 0.40789
cg14271400 0.01819 0.01185 0.01431 0.00956 0.01659 0.01627
cg14456683 0.11991 0.16537 0.11308 0.15796 0.08629 0.123
cg14494596 0.26358 0.21844 0.22732 0.17478 0.2312 0.23422
cg14839898 0.72788 0.81654 0.78631 0.75761 0.79185 0.84206
cg15051063 0.86465 0.88379 0.90877 0.90578 0.91853 0.89638
cg15238200 0.89404 0.90208 0.89723 0.92504 0.90008 0.90732
cg15297650 0.56282 0.54457 0.52491 0.56457 0.53075 0.51551
cg15416233 0.01967 0.02199 0.02001 0.01811 0.02602 0.01342
cg15457899 0.02767 0.03629 0.02363 0.03105 0.02619 0.02399
cg15700197 0.77608 0.69387 0.78467 0.65067 0.67196 0.75394
cg16466334 0.87871 0.83123 0.84739 0.84383 0.82543 0.85098
cg16713808 0.84041 0.8645 0.82823 0.84576 0.85799 0.8534
cg16744741 0.58196 0.37294 0.53974 0.49274 0.43937 0.47771
cg17397493 0.69292 0.68085 0.70989 0.69495 0.62394 0.73776
cg17421623 0.28321 0.29394 0.2954 0.28258 0.46249 0.27452
cg17536532 0.26037 0.21048 0.17844 0.21114 0.17129 0.22396
cg17907567 0.55462 0.61811 0.61251 0.65021 0.50416 0.62007
cg18081104 0.67538 0.68875 0.73506 0.71126 0.65321 0.66715
cg18644543 0.02051 0.02069 0.02445 0.02347 0.01356 0.01821
cg18804206 0.39515 0.36285 0.38365 0.41037 0.39427 0.31803
cg19162158 0.01385 0.01985 0.02215 0.0203 0.01987 0.0178
cg19654437 0.68549 0.64168 0.64416 0.65894 0.65318 0.64696
cg19722847 0.21324 0.17639 0.17544 0.19931 0.19685 0.17071
cg19761273 0.36619 0.29827 0.30022 0.29201 0.2752 0.2937
cg20025656 0.08331 0.13489 0.12777 0.13157 0.09639 0.1132
cg20349377 0.79644 0.80886 0.79457 0.80498 0.78662 0.85975
cg20654468 0.13849 0.1002 0.14461 0.14948 0.09369 0.12834
cg21057046 0.35422 0.29929 0.35232 0.34243 0.3198 0.32908
cg21959619 0.38679 0.34031 0.36219 0.41944 0.35452 0.37467
cg22705225 0.33013 0.26357 0.33506 0.28535 0.32889 0.28853
cg22724153 0.79502 0.77029 0.72968 0.83414 0.7375 0.83588
cg22747092 0.39269 0.36508 0.36385 0.35323 0.34698 0.38473
cg22884082 0.80072 0.7714 0.80921 0.78903 0.77453 0.80949
cg22919728 0.4119 0.44201 0.46595 0.39188 0.41327 0.33866
cg23320649 0.63328 0.61863 0.56862 0.63232 0.60082 0.57153
cg23587449 0.04374 0.03254 0.03192 0.04116 0.03135 0.0376
cg23668631 0.36103 0.32555 0.34138 0.36661 0.27267 0.31648
cg24001070 0.01714 0.02357 0.02475 0.03267 0.02613 0.02908
cg24433189 0.73405 0.73147 0.72254 0.70199 0.7309 0.73507
cg24453664 0.36084 0.32744 0.33648 0.36059 0.29398 0.33018
cg24471894 0.23223 0.17337 0.20459 0.22823 0.15967 0.16978
cg24949049 0.89294 0.89647 0.85577 0.82345 0.83256 0.89099
cg25152942 0.71402 0.75849 0.76344 0.75332 0.75923 0.79908
cg25398949 0.01967 0.02218 0.0257 0.02746 0.0162 0.02195
cg25538571 0.49158 0.41101 0.39928 0.42806 0.38972 0.41715
cg25762706 0.76365 0.84042 0.87628 0.87753 0.84563 0.86421
cg25809905 0.80621 0.69499 0.6403 0.66809 0.78487 0.67715
cg25827666 0.90076 0.89731 0.89435 0.90512 0.87995 0.88086
cg26227465 0.8067 0.77085 0.77081 0.79984 0.82901 0.84195
cg26232558 0.85862 0.85358 0.80764 0.81245 0.86506 0.8545
cg26379475 0.81402 0.82447 0.82856 0.71156 0.84756 0.85061
cg27003571 0.8058 0.80214 0.6961 0.748 0.73604 0.78404
cg27553955 0.21235 0.29664 0.25394 0.25414 0.2268 0.29984

And in the end what I hope to have is a predicted age for each individual in months ... like:

Predict_Age_Pat1 Predict_Age_Pat2 Predict_Age_Pat3 Predict_Age_Pat4 Predict_Age_Pat5 Predict_Age_Pat6
52 165 146 97 146 81

I made files out of the first few rows of the data you posted. The code below reshapes the data, matches the coefficients with the values measured for each patient, calculates coefficient * Value, and sums those results for each patient. Remember that I used only a a few CpGs so the results do not make sense but the method will apply to the full data set. I also put in several head() functions to show what each step does.

Coef <- read.csv("c:/users/fjcc/Documents/R/Play/CpGs.csv", stringsAsFactors = FALSE)
Patients <- read.csv("c:/users/fjcc/Documents/R/Play/patients.txt", stringsAsFactors = FALSE)
Coef
#>          CpGs Active.coefficients
#> 1 (Intercept)           2.3768538
#> 2  cg00343092          -1.1359977
#> 3  cg00563932          -2.2549679
#> 4  cg00571634           1.2051302
#> 5  cg00629217          -1.4431147
#> 6  cg01511567           0.5572547
#> 7  cg01515426          -4.7538892
Patients
#>         CpGs Patient1 Patient2 Patient3 Patient4 Patient5 Patient6
#> 1 cg00343092  0.15721  0.01721  0.08350  0.09762  0.12237  0.06944
#> 2 cg00563932  0.60688  0.61560  0.59548  0.58357  0.57340  0.55717
#> 3 cg00571634  0.08351  0.12219  0.09979  0.13198  0.10580  0.10946
#> 4 cg00629217  0.84616  0.88872  0.86079  0.86194  0.87492  0.83890
#> 5 cg01511567  0.26409  0.20216  0.16384  0.18531  0.20060  0.20192
#> 6 cg01515426  0.01682  0.01337  0.01863  0.00866  0.01611  0.02004
library(dplyr)
library(tidyr)
PatientsTall <- Patients %>% gather(key = Patient, value = Value, Patient1:Patient6)
head(PatientsTall,8)
#>         CpGs  Patient   Value
#> 1 cg00343092 Patient1 0.15721
#> 2 cg00563932 Patient1 0.60688
#> 3 cg00571634 Patient1 0.08351
#> 4 cg00629217 Patient1 0.84616
#> 5 cg01511567 Patient1 0.26409
#> 6 cg01515426 Patient1 0.01682
#> 7 cg00343092 Patient2 0.01721
#> 8 cg00563932 Patient2 0.61560
PatientsTall <- inner_join(PatientsTall, Coef, by = "CpGs")
head(PatientsTall,8)
#>         CpGs  Patient   Value Active.coefficients
#> 1 cg00343092 Patient1 0.15721          -1.1359977
#> 2 cg00563932 Patient1 0.60688          -2.2549679
#> 3 cg00571634 Patient1 0.08351           1.2051302
#> 4 cg00629217 Patient1 0.84616          -1.4431147
#> 5 cg01511567 Patient1 0.26409           0.5572547
#> 6 cg01515426 Patient1 0.01682          -4.7538892
#> 7 cg00343092 Patient2 0.01721          -1.1359977
#> 8 cg00563932 Patient2 0.61560          -2.2549679
PatientsTall <- PatientsTall %>% mutate(CalcValue = Value * Active.coefficients)
head(PatientsTall,8)
#>         CpGs  Patient   Value Active.coefficients   CalcValue
#> 1 cg00343092 Patient1 0.15721          -1.1359977 -0.17859019
#> 2 cg00563932 Patient1 0.60688          -2.2549679 -1.36849494
#> 3 cg00571634 Patient1 0.08351           1.2051302  0.10064043
#> 4 cg00629217 Patient1 0.84616          -1.4431147 -1.22110597
#> 5 cg01511567 Patient1 0.26409           0.5572547  0.14716539
#> 6 cg01515426 Patient1 0.01682          -4.7538892 -0.07996042
#> 7 cg00343092 Patient2 0.01721          -1.1359977 -0.01955052
#> 8 cg00563932 Patient2 0.61560          -2.2549679 -1.38815826
Ages <- PatientsTall %>% group_by(Patient) %>% 
  summarize(Age = sum(CalcValue) + Coef[1,2]) #Coef[1,2] is the intercept
Ages #The Ages make no sense because I used only 6 coefficients.
#> # A tibble: 6 x 2
#>   Patient      Age
#>   <chr>      <dbl>
#> 1 Patient1 -0.223 
#> 2 Patient2 -0.117 
#> 3 Patient3 -0.180 
#> 4 Patient4 -0.0727
#> 5 Patient5 -0.155 
#> 6 Patient6 -0.0199

Created on 2020-04-02 by the reprex package (v0.3.0)

Hi, FJCC, THANK YOU again!
However, the final results applying all coefficients were as follows:

A tibble: 6 x 2

Patient Age

1 Patient1 0.145
2 Patient2 3.08
3 Patient3 1.96
4 Patient4 2.63
5 Patient5 1.90
6 Patient6 2.84

I had done this formula before (via excel) based on what I read in tutorials on the internet and I got the same results. Which is strange because it is very different from age ... The ages (in months) that I posted in the last answer here are the original ages from the patients.

Predict_Age_Pat1 Predict_Age_Pat2 Predict_Age_Pat3 Predict_Age_Pat4 Predict_Age_Pat5 Predict_Age_Pat6
52 165 146 97 146 81

See:

Coef <- read.table("coef.txt", stringsAsFactors = FALSE, header = T, dec = ",")
head(Coef)
CpGs Active.coefficients
1 (Intercept) 2.3768538
2 cg00343092 -1.1359977
3 cg00563932 -2.2549679
4 cg00571634 1.2051302
5 cg00629217 -1.4431147
6 cg01511567 0.5572547
Patients <- read.table("patients.txt", stringsAsFactors = FALSE, header = T)
head(Patients)
CpGs Patient1 Patient2 Patient3 Patient4 Patient5 Patient6
1 cg00343092 0.15721 0.01721 0.08350 0.09762 0.12237 0.06944
2 cg00563932 0.60688 0.61560 0.59548 0.58357 0.57340 0.55717
3 cg00571634 0.08351 0.12219 0.09979 0.13198 0.10580 0.10946
4 cg00629217 0.84616 0.88872 0.86079 0.86194 0.87492 0.83890
5 cg01511567 0.26409 0.20216 0.16384 0.18531 0.20060 0.20192
6 cg01515426 0.01682 0.01337 0.01863 0.00866 0.01611 0.02004
library(dplyr)
library(tidyr)
PatientsTall <- Patients %>% gather(key = Patient, value = Value, Patient1:Patient6)
head(PatientsTall);tail(PatientsTall)
CpGs Patient Value
1 cg00343092 Patient1 0.15721
2 cg00563932 Patient1 0.60688
3 cg00571634 Patient1 0.08351
4 cg00629217 Patient1 0.84616
5 cg01511567 Patient1 0.26409
6 cg01515426 Patient1 0.01682
CpGs Patient Value
661 cg25827666 Patient6 0.88086
662 cg26227465 Patient6 0.84195
663 cg26232558 Patient6 0.85450
664 cg26379475 Patient6 0.85061
665 cg27003571 Patient6 0.78404
666 cg27553955 Patient6 0.29984
PatientsTall <- inner_join(PatientsTall, Coef, by = "CpGs")
head(PatientsTall);tail(PatientsTall)
CpGs Patient Value Active.coefficients
1 cg00343092 Patient1 0.15721 -1.1359977
2 cg00563932 Patient1 0.60688 -2.2549679
3 cg00571634 Patient1 0.08351 1.2051302
4 cg00629217 Patient1 0.84616 -1.4431147
5 cg01511567 Patient1 0.26409 0.5572547
6 cg01515426 Patient1 0.01682 -4.7538892
CpGs Patient Value Active.coefficients
661 cg25827666 Patient6 0.88086 -1.9203303
662 cg26227465 Patient6 0.84195 0.3444122
663 cg26232558 Patient6 0.85450 0.3862605
664 cg26379475 Patient6 0.85061 -0.5471105
665 cg27003571 Patient6 0.78404 -0.6753860
666 cg27553955 Patient6 0.29984 1.2156377
PatientsTall <- PatientsTall %>% mutate(CalcValue = Value * Active.coefficients)
head(PatientsTall);tail(PatientsTall)
CpGs Patient Value Active.coefficients CalcValue
1 cg00343092 Patient1 0.15721 -1.1359977 -0.17859019
2 cg00563932 Patient1 0.60688 -2.2549679 -1.36849494
3 cg00571634 Patient1 0.08351 1.2051302 0.10064043
4 cg00629217 Patient1 0.84616 -1.4431147 -1.22110597
5 cg01511567 Patient1 0.26409 0.5572547 0.14716539
6 cg01515426 Patient1 0.01682 -4.7538892 -0.07996042
CpGs Patient Value Active.coefficients CalcValue
661 cg25827666 Patient6 0.88086 -1.9203303 -1.6915421
662 cg26227465 Patient6 0.84195 0.3444122 0.2899778
663 cg26232558 Patient6 0.85450 0.3862605 0.3300596
664 cg26379475 Patient6 0.85061 -0.5471105 -0.4653776
665 cg27003571 Patient6 0.78404 -0.6753860 -0.5295296
666 cg27553955 Patient6 0.29984 1.2156377 0.3644968
Ages <- PatientsTall %>% group_by(Patient) %>% summarize(Age = sum(CalcValue) + Coef[1,2])
Ages

A tibble: 6 x 2

Patient Age

1 Patient1 0.145
2 Patient2 3.08
3 Patient3 1.96
4 Patient4 2.63
5 Patient5 1.90
6 Patient6 2.84

In the methodology, the author comments that he transformed age before training the model, using the inverse F formula (see below), do these age values obtained by the coefficients need to be transformed? (I think not!)
I can't understand why it isn't working! :cold_sweat: :tired_face:

Based on the training set data, we found it advantageous to transform age using function F before building the prediction model. Using the inverse of function F, we transformed the linear part of the regression model into the DNA methylation age. Function F was as follows (toddler.age was set to 48 months):

F(age)=log(age+1)−log(toddler.age+1)ifage≤toddler.age

F(age)=(age−toddler.age)/(toddler.age+1)ifage>toddler.age

The child-specific biological age prediction1 model was established through sure independence screening combined with multivariate linear modeling based on the elastic net algorithm. First, we used sure independence screening (implemented in the R package ‘SIS’) [20] to reduce the dimensionality of the ~21,000 β values in the datasets. This step was taken because variable selection methods (e.g., lasso, LARS, SCAD) do not perform well when the dimension of the predictor variable p is much larger than the sample size n. Then, an elastic net regression model (implemented in the glmnet R function) [21] was used to regress a transformed model of age based on 111 β values in the training data. The elastic net approach is a combination of traditional lasso and ridge regression methods, emphasizing model sparsity while appropriately balancing the contributions of correlated variables. The glmnet function requires the user to specify two parameters (alpha and lambda). Since we used an elastic net predictor, alpha was set to 0.48, and lambda was set to 0.000954 based on 10-fold cross-validation of the training data (via the R function cv.glmnet).

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.