Hi! I am having trouble working with a regression with so many regression coefficients. I'm trying to get the estimated value from using the new data column values. I am trying to do the formulas manually and it is not working, is there any formula or tool in R for this analysis?
The predict function can be used to generated predicted values from a new data set.
(I notice that in the image you posted the Active.coefficients use a comma as the decimal separator and the New.data uses a period. Is that causing you problems?)
Hi! FJCC, thanks for the answer!! It is not a problem with the separator because this column I import with read.table ("file", sep = ",").
I tried to applied your model but I think my explanation was confused.
I am basing myself on a work (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6914436/) that created an age prediction model based on combined molecular characteristics (111 characteristics in total). This work provides a list of the regression coefficient for each of these characteristics. I am trying to apply these coefficients to the quantified value of these characteristics in order to obtain the predicted age of the individual. I have been trying to apply these coefficients for weeks but my background is not mathematical or computer science, and I have tried everything in my view....
The data would be like:
The regression coefficients from the original work are;
CpGs
Active.coefficients
(Intercept)
2,376853787
cg00343092
-1,135997653
cg00563932
-2,254967927
cg00571634
1,205130235
cg00629217
-1,443114744
cg01511567
0,5572547
cg01515426
-4,753889226
cg01756060
-2,975525682
cg01899253
-0,877083763
cg02385474
-1,101457403
cg02489552
0,534293486
cg02626929
0,128916398
cg02789485
0,353159992
cg03224418
-0,165348646
cg03340261
-6,689713985
cg03970609
0,563336196
cg04458548
-0,283434273
cg04460372
-0,967940889
cg04474832
-1,203668298
cg04527989
0,677609524
cg04784672
1,786235879
cg05073035
1,216035058
cg05228408
-0,609085835
cg05294455
0,910369194
cg05352668
-1,571369198
cg05921699
1,800240774
cg05995267
0,600433301
cg06204948
0,639126779
cg06495803
0,396327381
cg07408456
-2,025444379
cg08032971
0,688587081
cg08355340
7,896033939
cg08418332
1,173668556
cg08675585
-1,519238797
cg08711674
1,119106843
cg08724636
-6,996531515
cg08733315
-6,468543656
cg08872742
0,940541228
cg09013626
0,613837924
cg09600829
-0,732377239
cg09809672
-0,614857266
cg09816471
1,516437238
cg10052840
-0,923536683
cg10834677
0,797599347
cg10986043
-2,379443118
cg11267879
-0,554874255
cg11314292
-3,38499408
cg11635563
-0,586380758
cg12024906
1,17757674
cg12467090
1,860965096
cg12688670
0,186533062
cg12810837
-1,305939966
cg13018903
0,152892228
cg13269407
-0,157006414
cg13436343
3,64661945
cg13590277
1,591073245
cg13870494
0,189212303
cg13946500
-3,850796145
cg14093936
-0,902373468
cg14271400
5,710688531
cg14456683
2,071567414
cg14494596
1,371768443
cg14839898
1,815352105
cg15051063
1,366658344
cg15238200
2,004655666
cg15297650
-1,589846694
cg15416233
-2,474569514
cg15457899
0,461285026
cg15700197
-0,378835203
cg16466334
-0,519673323
cg16713808
2,149119757
cg16744741
-0,628229923
cg17397493
0,501381132
cg17421623
0,555595016
cg17536532
1,437845258
cg17907567
0,722904509
cg18081104
0,414502763
cg18644543
0,931062722
cg18804206
1,561205825
cg19162158
3,204798604
cg19654437
-0,721912821
cg19722847
-1,328174578
cg19761273
-1,792864828
cg20025656
1,883236071
cg20349377
-1,099138807
cg20654468
0,335556248
cg21057046
-1,746170745
cg21959619
-0,108227218
cg22705225
-1,211995542
cg22724153
-0,712185236
cg22747092
-0,155437965
cg22884082
0,62832282
cg22919728
-0,292299689
cg23320649
-0,87213435
cg23587449
-5,214476293
cg23668631
-0,54580146
cg24001070
1,273455384
cg24433189
0,475029669
cg24453664
1,335584886
cg24471894
-0,519909509
cg24949049
0,75191663
cg25152942
0,511762568
cg25398949
3,475324556
cg25538571
-0,430581867
cg25762706
1,555819249
cg25809905
-1,731180008
cg25827666
-1,920330281
cg26227465
0,34441217
cg26232558
0,386260531
cg26379475
-0,547110473
cg27003571
-0,675385959
cg27553955
1,215637677
And the data of the individuals would be:
CpGs
Patient1
Patient2
Patient3
Patient4
Patient5
Patient6
cg00343092
0.15721
0.01721
0.0835
0.09762
0.12237
0.06944
cg00563932
0.60688
0.6156
0.59548
0.58357
0.5734
0.55717
cg00571634
0.08351
0.12219
0.09979
0.13198
0.1058
0.10946
cg00629217
0.84616
0.88872
0.86079
0.86194
0.87492
0.8389
cg01511567
0.26409
0.20216
0.16384
0.18531
0.2006
0.20192
cg01515426
0.01682
0.01337
0.01863
0.00866
0.01611
0.02004
cg01756060
0.01456
0.01345
0.01574
0.01668
0.01434
0.01526
cg01899253
0.34035
0.37414
0.37194
0.28808
0.28419
0.35136
cg02385474
0.83976
0.76332
0.84002
0.7836
0.82859
0.80264
cg02489552
0.37446
0.51986
0.43513
0.48885
0.44473
0.42185
cg02626929
0.20293
0.17587
0.17848
0.28723
0.16282
0.14623
cg02789485
0.36728
0.27761
0.24908
0.32464
0.27244
0.25471
cg03224418
0.51308
0.35162
0.39172
0.41432
0.39993
0.43563
cg03340261
0.02264
0.01599
0.01705
0.0138
0.01877
0.01718
cg03970609
0.72254
0.77394
0.69285
0.68143
0.70075
0.73901
cg04458548
0.87895
0.81816
0.77241
0.82116
0.88704
0.87619
cg04460372
0.18746
0.18652
0.20367
0.19732
0.18399
0.15137
cg04474832
0.34517
0.31593
0.30893
0.30676
0.32518
0.29056
cg04527989
0.71933
0.71528
0.77079
0.68082
0.84132
0.75996
cg04784672
0.03634
0.04044
0.02943
0.03627
0.0392
0.03284
cg05073035
0.08553
0.091
0.08716
0.11191
0.07434
0.10127
cg05228408
0.45748
0.43421
0.45368
0.47118
0.45669
0.44544
cg05294455
0.67084
0.62494
0.68739
0.62771
0.73867
0.73212
cg05352668
0.9087
0.85648
0.91095
0.8832
0.91448
0.91564
cg05921699
0.84026
0.8221
0.85909
0.82634
0.82772
0.86858
cg05995267
0.20053
0.18206
0.18958
0.17405
0.13563
0.17896
cg06204948
0.06649
0.06296
0.04071
0.06928
0.08785
0.04238
cg06495803
0.41438
0.50091
0.35867
0.45391
0.4396
0.47239
cg07408456
0.65063
0.48084
0.54465
0.55821
0.5203
0.4551
cg08032971
0.79365
0.8364
0.75933
0.78352
0.77349
0.7768
cg08355340
0.01597
0.02229
0.01436
0.01929
0.01692
0.01054
cg08418332
0.71415
0.69801
0.70208
0.73758
0.63011
0.69004
cg08675585
0.80022
0.75221
0.82609
0.79018
0.82477
0.81339
cg08711674
0.7634
0.74825
0.77294
0.74399
0.71052
0.78525
cg08724636
0.01245
0.00647
0.01477
0.01575
0.02192
0.00981
cg08733315
0.02088
0.01773
0.02546
0.0208
0.01801
0.01907
cg08872742
0.39669
0.29745
0.35466
0.33859
0.3553
0.32622
cg09013626
0.0528
0.0509
0.05463
0.0509
0.04223
0.0491
cg09600829
0.03388
0.03173
0.04444
0.0385
0.04698
0.03841
cg09809672
0.72143
0.6037
0.50056
0.55913
0.68374
0.55818
cg09816471
0.19934
0.23309
0.19529
0.27122
0.21101
0.23471
cg10052840
0.86945
0.78551
0.8597
0.89295
0.78624
0.83486
cg10834677
0.79854
0.78085
0.82319
0.80509
0.79179
0.81516
cg10986043
0.77062
0.70717
0.75222
0.67219
0.71146
0.76841
cg11267879
0.87045
0.82953
0.84099
0.82397
0.85228
0.84037
cg11314292
0.03309
0.03382
0.02648
0.04337
0.0356
0.03561
cg11635563
0.83155
0.84839
0.86401
0.86138
0.82716
0.85118
cg12024906
0.12843
0.20284
0.29331
0.16172
0.36719
0.30311
cg12467090
0.72175
0.79263
0.72342
0.77682
0.75492
0.77514
cg12688670
0.28446
0.20875
0.25606
0.22768
0.26597
0.21723
cg12810837
0.37557
0.36258
0.41961
0.28978
0.44128
0.45181
cg13018903
0.84728
0.85588
0.83968
0.87382
0.83659
0.85778
cg13269407
0.28306
0.27002
0.24351
0.1698
0.17548
0.2675
cg13436343
0.01063
0.00764
0.01074
0.01118
0.01093
0.01226
cg13590277
0.57157
0.56132
0.5699
0.60539
0.56594
0.61106
cg13870494
0.7386
0.76942
0.67896
0.70996
0.66272
0.74398
cg13946500
0.06106
0.03825
0.04108
0.04938
0.04514
0.04567
cg14093936
0.43508
0.40067
0.40306
0.39938
0.38223
0.40789
cg14271400
0.01819
0.01185
0.01431
0.00956
0.01659
0.01627
cg14456683
0.11991
0.16537
0.11308
0.15796
0.08629
0.123
cg14494596
0.26358
0.21844
0.22732
0.17478
0.2312
0.23422
cg14839898
0.72788
0.81654
0.78631
0.75761
0.79185
0.84206
cg15051063
0.86465
0.88379
0.90877
0.90578
0.91853
0.89638
cg15238200
0.89404
0.90208
0.89723
0.92504
0.90008
0.90732
cg15297650
0.56282
0.54457
0.52491
0.56457
0.53075
0.51551
cg15416233
0.01967
0.02199
0.02001
0.01811
0.02602
0.01342
cg15457899
0.02767
0.03629
0.02363
0.03105
0.02619
0.02399
cg15700197
0.77608
0.69387
0.78467
0.65067
0.67196
0.75394
cg16466334
0.87871
0.83123
0.84739
0.84383
0.82543
0.85098
cg16713808
0.84041
0.8645
0.82823
0.84576
0.85799
0.8534
cg16744741
0.58196
0.37294
0.53974
0.49274
0.43937
0.47771
cg17397493
0.69292
0.68085
0.70989
0.69495
0.62394
0.73776
cg17421623
0.28321
0.29394
0.2954
0.28258
0.46249
0.27452
cg17536532
0.26037
0.21048
0.17844
0.21114
0.17129
0.22396
cg17907567
0.55462
0.61811
0.61251
0.65021
0.50416
0.62007
cg18081104
0.67538
0.68875
0.73506
0.71126
0.65321
0.66715
cg18644543
0.02051
0.02069
0.02445
0.02347
0.01356
0.01821
cg18804206
0.39515
0.36285
0.38365
0.41037
0.39427
0.31803
cg19162158
0.01385
0.01985
0.02215
0.0203
0.01987
0.0178
cg19654437
0.68549
0.64168
0.64416
0.65894
0.65318
0.64696
cg19722847
0.21324
0.17639
0.17544
0.19931
0.19685
0.17071
cg19761273
0.36619
0.29827
0.30022
0.29201
0.2752
0.2937
cg20025656
0.08331
0.13489
0.12777
0.13157
0.09639
0.1132
cg20349377
0.79644
0.80886
0.79457
0.80498
0.78662
0.85975
cg20654468
0.13849
0.1002
0.14461
0.14948
0.09369
0.12834
cg21057046
0.35422
0.29929
0.35232
0.34243
0.3198
0.32908
cg21959619
0.38679
0.34031
0.36219
0.41944
0.35452
0.37467
cg22705225
0.33013
0.26357
0.33506
0.28535
0.32889
0.28853
cg22724153
0.79502
0.77029
0.72968
0.83414
0.7375
0.83588
cg22747092
0.39269
0.36508
0.36385
0.35323
0.34698
0.38473
cg22884082
0.80072
0.7714
0.80921
0.78903
0.77453
0.80949
cg22919728
0.4119
0.44201
0.46595
0.39188
0.41327
0.33866
cg23320649
0.63328
0.61863
0.56862
0.63232
0.60082
0.57153
cg23587449
0.04374
0.03254
0.03192
0.04116
0.03135
0.0376
cg23668631
0.36103
0.32555
0.34138
0.36661
0.27267
0.31648
cg24001070
0.01714
0.02357
0.02475
0.03267
0.02613
0.02908
cg24433189
0.73405
0.73147
0.72254
0.70199
0.7309
0.73507
cg24453664
0.36084
0.32744
0.33648
0.36059
0.29398
0.33018
cg24471894
0.23223
0.17337
0.20459
0.22823
0.15967
0.16978
cg24949049
0.89294
0.89647
0.85577
0.82345
0.83256
0.89099
cg25152942
0.71402
0.75849
0.76344
0.75332
0.75923
0.79908
cg25398949
0.01967
0.02218
0.0257
0.02746
0.0162
0.02195
cg25538571
0.49158
0.41101
0.39928
0.42806
0.38972
0.41715
cg25762706
0.76365
0.84042
0.87628
0.87753
0.84563
0.86421
cg25809905
0.80621
0.69499
0.6403
0.66809
0.78487
0.67715
cg25827666
0.90076
0.89731
0.89435
0.90512
0.87995
0.88086
cg26227465
0.8067
0.77085
0.77081
0.79984
0.82901
0.84195
cg26232558
0.85862
0.85358
0.80764
0.81245
0.86506
0.8545
cg26379475
0.81402
0.82447
0.82856
0.71156
0.84756
0.85061
cg27003571
0.8058
0.80214
0.6961
0.748
0.73604
0.78404
cg27553955
0.21235
0.29664
0.25394
0.25414
0.2268
0.29984
And in the end what I hope to have is a predicted age for each individual in months ... like:
I made files out of the first few rows of the data you posted. The code below reshapes the data, matches the coefficients with the values measured for each patient, calculates coefficient * Value, and sums those results for each patient. Remember that I used only a a few CpGs so the results do not make sense but the method will apply to the full data set. I also put in several head() functions to show what each step does.
I had done this formula before (via excel) based on what I read in tutorials on the internet and I got the same results. Which is strange because it is very different from age ... The ages (in months) that I posted in the last answer here are the original ages from the patients.
In the methodology, the author comments that he transformed age before training the model, using the inverse F formula (see below), do these age values obtained by the coefficients need to be transformed? (I think not!)
I can't understand why it isn't working!
Based on the training set data, we found it advantageous to transform age using function F before building the prediction model. Using the inverse of function F, we transformed the linear part of the regression model into the DNA methylation age. Function F was as follows (toddler.age was set to 48 months):
The child-specific biological age prediction1 model was established through sure independence screening combined with multivariate linear modeling based on the elastic net algorithm. First, we used sure independence screening (implemented in the R package ‘SIS’) [20] to reduce the dimensionality of the ~21,000 β values in the datasets. This step was taken because variable selection methods (e.g., lasso, LARS, SCAD) do not perform well when the dimension of the predictor variable p is much larger than the sample size n. Then, an elastic net regression model (implemented in the glmnet R function) [21] was used to regress a transformed model of age based on 111 β values in the training data. The elastic net approach is a combination of traditional lasso and ridge regression methods, emphasizing model sparsity while appropriately balancing the contributions of correlated variables. The glmnet function requires the user to specify two parameters (alpha and lambda). Since we used an elastic net predictor, alpha was set to 0.48, and lambda was set to 0.000954 based on 10-fold cross-validation of the training data (via the R function cv.glmnet).