step wise regression

I am trying to run a stepwise regression both ways (forward and backward) to determine which variables are most impactive on ThrowDistance and I am getting stuck with the formatting/what code is best to compare the extensive amount of variables. I also want to run an ANOVA test but I also do not know the code for that.

The first image is the data labels so they can be referenced for code building and the second image is what I have completed thus far.

Anything helps, thank you so much in advance!

here is the dataset that i used
Vehicle,Speed,FrictionVehPed,FrictionPedGround,Restitution,ThrowDisiance,Percentile,Gender,CGHeight,Stature,Weight
ToyotaCamry,5,0.2,1.2,0.3,4.7,95,Male,40.1,62.65,221
ToyotaCamry,10,0.25,1.25,0.35,4.52,95,Male,40.6,63.15,221.5
ToyotaCamry,15,0.3,1.3,0.4,11.16,95,Male,41.1,63.65,222
ToyotaCamry,20,0.35,1.35,0.45,20.52,95,Male,41.6,64.15,222.5
ToyotaCamry,25,0.4,1.4,0.5,30.48,95,Male,40.1,62.65,221
ToyotaCamry,30,0.45,1.45,0.55,42.75,95,Male,40.1,62.65,221
ToyotaCamry,35,0.2,0.2,0.2,49.95,95,Male,40.6,63.15,221.5
ToyotaCamry,40,0.2,0.2,0.2,65.18,95,Male,41.1,63.65,222
ToyotaCamry,45,0.2,0.2,0.2,88.72,95,Male,41.6,64.15,222.5
ToyotaCamry,50,0.25,0.25,0.25,110.16,95,Female,42.1,64.65,223
ToyotaCamry,55,0.3,0.3,0.3,132.22,95,Female,42.6,65.15,223.5
ToyotaCamry,60,0.35,0.35,0.35,152.28,95,Female,41.85,64.4,222.75
ToyotaCamry,65,0.4,0.4,0.4,194.59,95,Female,41.1,63.65,222
ToyotaCamry,70,0.2,0.2,0.2,211.65,95,Female,40.35,62.9,221.25
ChevroletSuburban,5,0.2,0.2,0.2,4.78,95,Female,39.6,62.15,220.5
ChevroletSuburban,10,0.25,0.25,0.25,4.73,95,Female,40.1,62.65,221
ChevroletSuburban,15,0.3,0.3,0.3,15.86,95,Female,40.6,63.15,221.5
ChevroletSuburban,20,0.35,0.35,0.35,30.07,95,Female,39.85,62.4,220.75

here is the code for Anova output

#upload dataset (select R community attached)
mydata <- read.csv(file.choose(),header = T,sep = ',', stringsAsFactors=FALSE);

#define intercept-only model
intercept_only <- lm(ThrowDisiance ~ 1, data=mydata)

#define model with all predictors
all <- lm(ThrowDisiance ~ ., data=mydata)

#perform forward stepwise regression
forward <- step(intercept_only, direction='forward', scope=formula(all), trace=0)

#view results of forward stepwise regression
xx <- forward$anova

aa<- as.list(rownames(xx))
n<-length(aa)
#variable 1
x <- as.list( aa[1:n])
#Variable 2
l<-c()
i=1
while(i<=n) {
if (i==1){
b<-"relative"
}
else{
b<-"relative"
}
l<-c(l,b)
i=i+1
}
wfmeasure <- as.list(l)
#Variable 3
l <-c()
i=1
totResidDev <- round(xx$"Resid. Dev"[1],3)
while(i<=n) {
if (i==1){
b<-paste(as.character(sprintf(round(xx$"Resid. Dev"[1]/totResidDev100,3), fmt="%#.2f")),"%")
}
else{
b<-paste(as.character(sprintf(round(xx$Deviance[i]
-1/totResidDev*100,3), fmt="%#.2f")),"%")
}
l<-c(l,b)
i=i+1
}
text <- as.list(l)

text <- as.list(as.character(round(xx$coefficients[2:n,1],3)))

#Variable 4
l <-c()
i=1
while(i<=n) {
if (i==1){
b<-sprintf(round(xx$"Resid. Dev"[1],3), fmt="%#.3f")
}
else{
b<-sprintf(round(xx$Deviance[i]*-1,3), fmt="%#.3f")
}
l<-c(l,b)
i=i+1
}
y <- as.list(l)
datawaterfall = data.frame(x=factor(x,levels=x),wfmeasure,text,y)

f <- list( #plotly chart axis names /naming: where you got: Axes Labels | R | Plotly
family = "Courier New, monospace",
size = 15,
color = "#000000"
)
xname <- list(
title = "Predictors",
titlefont = f
)
yname <- list(
title = "% Deviance from Intercept",
titlefont = f
)

plot_ly(
datawaterfall, name = "20", type = "waterfall", measure = ~wfmeasure,
x = ~x, textposition = "outside", y= ~y, text =~text,
connector = list(line = list(color= "rgb(63, 63, 63)")))%>%
layout(xaxis = xname, yaxis = yname)%>%
layout(xaxis = list(tickfont = list(size = 15)),
yaxis = list(tickfont = list(size = 5)))

Thank you for your help! I am getting this error in the console though at the very end and I don't know how to debug/fix it.

plot_ly(

  • datawaterfall, name = "20", type = "waterfall", measure = ~wfmeasure,
  • x = ~x, textposition = "outside", y= ~y, text =~text,
  • connector = list(line = list(color= "rgb(63, 63, 63)"))) %>%
  • layout(xaxis = xname, yaxis = yname) %>%
  • layout(xaxis = list(tickfont = list(size = 15)),
  •      yaxis = list(tickfont = list(size = 5)))
    

Error in layout(., xaxis = list(tickfont = list(size = 15)), yaxis = list(tickfont = list(size = 5))) :
unused arguments (xaxis = list(tickfont = list(size = 15)), yaxis = list(tickfont = list(size = 5)))

Hi
Two things to check.

  1. Could you please check whether you have add at the beginning. library(plotly)
  2. there is no way to attached dataset.csv. hence attached is the image of the dataset, please open in excel the dataset that I have provided in my previous reply and validate with this image
    finally please remove one un-wanted line of code
    text <- as.list(as.character(round(xx$coefficients[2:n,1],3)))

image

Hi, Please consider other methods than stepwise:

https://towardsdatascience.com/stopping-stepwise-why-stepwise-selection-is-bad-and-what-you-should-use-instead-90818b3f52df

https://journalofbigdata.springeropen.com/articles/10.1186/s40537-018-0143-6

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.