Commands & outputs for file directory & column labels:
colnames(COPD)
[1] "X" "ID" "AGE" "PackHistory" "COPDSEVERITY" "MWT1" "MWT2"
[8] "MWT1Best" "FEV1" "FEV1PRED" "FVC" "FVCPRED" "CAT" "HAD"
[15] "SGRQ" "AGEquartiles" "copd" "gender" "smoking" "Diabetes" "muscular"
[22] "hypertension" "AtrialFib" "IHD"
Commands f & outputs or assessing key features of a dataset in R:
dim(COPD)
[1] 101 24
Above command shows 101 patients & 24 variables
Commands & outputs for checking data in R:
class(COPD$Diabetes)
[1] "integer"
COPD$Diabetes
[1] 1 1 1 0 0 1 1 1 1 0 1 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0
[59] 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
class(COPD$AtrialFib)
[1] "integer"
COPD$AtrialFib
[1] 1 1 1 1 0 1 1 1 1 0 1 1 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0
[59] 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
class(COPD$IHD)
[1] "integer"
COPD$IHD
[1] 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[59] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 1 0 0 0
Commands & outcome for making predictions for variables:
prediction(r2, at = list(Diabetes = c(0,1), AtrialFib = c(0,1)))
Data frame with 400 predictions from
lm(formula = MWT1Best ~ factor(Diabetes) + factor(AtrialFib) +
factor(Diabetes * AtrialFib), data = COPD)
with average predictions:
Diabetes AtrialFib x
0 0 428.1
1 0 420.5
0 1 356.1
1 1 218.3
prediction(r2, at = list(Diabetes = c(0,1), IHD = c(0,1)))
Data frame with 400 predictions from
lm(formula = MWT1Best ~ factor(Diabetes) + factor(AtrialFib) +
factor(Diabetes * AtrialFib), data = COPD)
with average predictions:
Diabetes IHD x
0 0 413.7
1 0 380.0
0 1 413.7
1 1 380.0
With 101 patients & 24 variables, why am I getting 400 predictions, which is to many?
What can I do fix the number of predictors?