Regression for data frame

Forgotten to mention to check homework policy if applicable.

To debug this type of problem, it helps to look at what there is to work with. First a diversion to clarify my usage.

One of the hard things to get used to in R is the concept that everything is an object that has properties. Some objects have properties that allow them to operate on other objects to produce new objects. Those are functions.

Think of R as school algebra writ large: f(x) = y, where the objects are f, a function, x, an object (and there may be several) termed the argument and y is an object termed a value, which can be as simple as a single number (aka an atomic vector) or a very packed object with a multitude of data and labels.

And, because functions are also objects, they can be arguments to other functions, like the old g(f(x)) = y. (Trivia, this is called being a first class object.)

Although there are function objects in R that operate like control statements in imperative/procedural language, they are best used "under the hood." As it presents to users interactively, R is a functional programming language. Instead of saying

take this, take that, do this, then do that, then if the result is this one thing, do this other thing, but if not do something else and give me the answer

in the style of most common programming languages, R allows the user to say

use this function to take this argument and turn it into the value I want for a result

The roles in

A  <-  lm(Murder ~ Population + Illiteracy + Income + Frost, data=state.x77)

consist of <-, a so-called primitive f that works as an assignment operator to send the return value of lm to the new object A, an object state.x77, Murder, an object within state.x77, the \sim operator that identifies the following objects to lm.

What do we see when we run the command?

A <- lm(Murder ~ Population + Illiteracy + Income + Frost, data=state.x77)
#> Error in model.frame.default(formula = Murder ~ Population + Illiteracy + : 'data' must be a data.frame, not a matrix or an array

Created on 2020-04-06 by the reprex package (v0.3.0)

This clearly points to states.x77 as the culprit. It's the wrong kind of object.

class(state.x77)
#> [1] "matrix"

Created on 2020-04-06 by the reprex package (v0.3.0)

matrix \ne data frame

So, what to do?

frame.x77 <- state.x77
as.data.frame(frame.x77) -> frame.x77
class(frame.x77)
#> [1] "data.frame"
A <- lm(Murder ~ Population + Illiteracy + Income + Frost, data=frame.x77)
summary(A)
#> 
#> Call:
#> lm(formula = Murder ~ Population + Illiteracy + Income + Frost, 
#>     data = frame.x77)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -4.7960 -1.6495 -0.0811  1.4815  7.6210 
#> 
#> Coefficients:
#>              Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 1.235e+00  3.866e+00   0.319   0.7510    
#> Population  2.237e-04  9.052e-05   2.471   0.0173 *  
#> Illiteracy  4.143e+00  8.744e-01   4.738 2.19e-05 ***
#> Income      6.442e-05  6.837e-04   0.094   0.9253    
#> Frost       5.813e-04  1.005e-02   0.058   0.9541    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 2.535 on 45 degrees of freedom
#> Multiple R-squared:  0.567,  Adjusted R-squared:  0.5285 
#> F-statistic: 14.73 on 4 and 45 DF,  p-value: 9.133e-08

Created on 2020-04-06 by the reprex package (v0.3.0)