I have my linear model made and now need to refine it by first identifying variables that are significant to sales.
model1 <- lm(Final_Sales_df$'Actual Net Sales' ~ ., data = Final_Sales_df[,c(4,5,6:91)])
I have exported the model summary into sheet 1 of excel, however it does not have the significant codes associated with each variable. Other than manually going down each row to identify what variables are significant in excel, is there an easy way to add another column in my excel file displaying the significant *'s?
Lastly, how would I rerun the model using just the significant variables that have been identified?
Thanks in advance for your help.
FJCC
November 15, 2022, 10:14pm
2
You could use the broom package to make a neat data frame of the fit summary and add a column with the significance symbols.
set.seed(123)
DF <- data.frame(X1 = rnorm(25),
X2 = runif(25,min = 5, max = 15))
DF$Y <- DF$X1 * 2.5 - DF$X2 * 0.95 + 6 + rnorm(25, mean = 0, sd = 4)
FIT <- lm(Y ~ X1 + X2, data = DF)
summary(FIT)
#>
#> Call:
#> lm(formula = Y ~ X1 + X2, data = DF)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -5.5982 -2.8848 -0.6088 1.7808 6.9164
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 5.9890 2.9948 2.000 0.05802 .
#> X1 2.9134 0.8674 3.359 0.00284 **
#> X2 -0.9436 0.2901 -3.253 0.00365 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 3.959 on 22 degrees of freedom
#> Multiple R-squared: 0.5471, Adjusted R-squared: 0.5059
#> F-statistic: 13.29 on 2 and 22 DF, p-value: 0.0001644
FIT_tidy <- broom::tidy(FIT)
FIT_tidy
#> # A tibble: 3 × 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 5.99 2.99 2.00 0.0580
#> 2 X1 2.91 0.867 3.36 0.00284
#> 3 X2 -0.944 0.290 -3.25 0.00365
library(dplyr)
FIT_tidy <- FIT_tidy |>
mutate(Sig. = case_when(
p.value <= 0.001 ~ "***",
p.value > 0.001 & p.value <= 0.01 ~ "**",
p.value > 0.01 & p.value <= 0.05 ~ "*",
p.value > 0.05 & p.value <= 0.10 ~ ".",
TRUE ~""
))
FIT_tidy
#> # A tibble: 3 × 6
#> term estimate std.error statistic p.value Sig.
#> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 (Intercept) 5.99 2.99 2.00 0.0580 .
#> 2 X1 2.91 0.867 3.36 0.00284 **
#> 3 X2 -0.944 0.290 -3.25 0.00365 **
Created on 2022-11-15 with reprex v2.0.2
FJCC:
FIT_tidy <- FIT_tidy |>
mutate(Sig. = case_when(
p.value <= 0.001 ~ "***",
p.value > 0.001 & p.value <= 0.01 ~ "**",
p.value > 0.01 & p.value <= 0.05 ~ "*",
p.value > 0.05 & p.value <= 0.10 ~ ".",
TRUE ~""
))
FIT_tidy
Awesome, that worked and now I have the significance codes in my excel file. Thank you! Now, how do I rerun the regression model with just those significant variables?
FJCC
November 15, 2022, 11:08pm
4
I added a third variable to the fit and wrote some code to construct a formula from only the "significant" variables.
set.seed(123)
DF <- data.frame(X1 = rnorm(25),
X2 = runif(25,min = 5, max = 15),
X3 = rnorm(25))
DF$Y <- DF$X1 * 2.5 - DF$X2 * 0.95 + 6 + rnorm(25, mean = 0, sd = 4)
FIT <- lm(Y ~ X1 + X2 + X3, data = DF)
summary(FIT)
#>
#> Call:
#> lm(formula = Y ~ X1 + X2 + X3, data = DF)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -7.0346 -1.6025 -0.4424 2.4654 7.4917
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 9.0421 2.5775 3.508 0.00209 **
#> X1 2.7930 0.7504 3.722 0.00126 **
#> X2 -1.3054 0.2497 -5.229 3.5e-05 ***
#> X3 -0.2201 0.7340 -0.300 0.76725
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 3.407 on 21 degrees of freedom
#> Multiple R-squared: 0.7028, Adjusted R-squared: 0.6604
#> F-statistic: 16.55 on 3 and 21 DF, p-value: 9.465e-06
FIT_tidy <- broom::tidy(FIT)
FIT_tidy
#> # A tibble: 4 × 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 9.04 2.58 3.51 0.00209
#> 2 X1 2.79 0.750 3.72 0.00126
#> 3 X2 -1.31 0.250 -5.23 0.0000350
#> 4 X3 -0.220 0.734 -0.300 0.767
library(dplyr)
FIT_tidy <- FIT_tidy |>
mutate(Sig. = case_when(
p.value <= 0.001 ~ "***",
p.value > 0.001 & p.value <= 0.01 ~ "**",
p.value > 0.01 & p.value <= 0.05 ~ "*",
p.value > 0.05 & p.value <= 0.10 ~ ".",
TRUE ~""
))
FIT_tidy
#> # A tibble: 4 × 6
#> term estimate std.error statistic p.value Sig.
#> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 (Intercept) 9.04 2.58 3.51 0.00209 "**"
#> 2 X1 2.79 0.750 3.72 0.00126 "**"
#> 3 X2 -1.31 0.250 -5.23 0.0000350 "***"
#> 4 X3 -0.220 0.734 -0.300 0.767 ""
VARS <- FIT_tidy |> filter(p.value < 0.01, term != "(Intercept)") |>
select(term)
VARS
#> # A tibble: 2 × 1
#> term
#> <chr>
#> 1 X1
#> 2 X2
RHS <- paste(unlist(VARS), collapse = "+")
FormulaText <- paste("Y", "~", RHS, collapse = "")
FormulaText
#> [1] "Y ~ X1+X2"
Formula <- as.formula(FormulaText)
NewFIT <- lm(Formula, data = DF)
summary(NewFIT)
#>
#> Call:
#> lm(formula = Formula, data = DF)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -6.7347 -1.6479 -0.3427 2.2394 7.6502
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 9.0427 2.5236 3.583 0.00166 **
#> X1 2.7703 0.7309 3.790 0.00100 **
#> X2 -1.3057 0.2444 -5.342 2.31e-05 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 3.336 on 22 degrees of freedom
#> Multiple R-squared: 0.7015, Adjusted R-squared: 0.6744
#> F-statistic: 25.86 on 2 and 22 DF, p-value: 1.674e-06
Created on 2022-11-15 with reprex v2.0.2
I am tracking with you so far and it makes sense what you proposed. However, when I got to the Formula <- as.formula(FormulaText) line, it produced the error message below.
Error in str2lang(x) : :1:368: unexpected numeric constant
Is this due to how FormulaText was constructed?
FJCC
November 16, 2022, 4:31pm
6
What is the value of FormulaText in your code?
Got it figured out, had some dates that were displaying as numbers so relabeled those and it worked! Thank you for your help!
system
Closed
December 7, 2022, 5:48pm
8
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.