Multiple Regression - imported data and NA values error

Hi all,

I'm brand new to R, attempting to use it for a paper for admission to J-school. I have a data set on city budgets with several different variables, a few of which are blank when I couldn't find reported numbers or no such numbers existed. Some variables are dollar amounts, some are numbers, some are percents. My code is:

mydata <- read.csv('C:\Users\Me\Documents\Research.csv', header = TRUE)
model <- lm(Y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7, data = mydata, na.action = na.omit)

And my error message relates to the NA values, which I just want the model to ignore:

Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
NA/NaN/Inf in 'y'
In addition: Warning message:
In storage.mode(v) <- "double" : NAs introduced by coercion

Cool, okay so turns out all my data is character instead of numeric values, and google searching the issue tells me I need to convert them all. So I used as.numeric as such:

mydata$X1 <- as.numeric(mydata$X1)

And then the values in my table display as all NA instead of numbers?? What am I missing? :disappointed:

Please post the result of

dput(head(mydata))

I suspect you have things like thousands separators and currency signs in the data but seeing the actual content of a few rows will allow specific advice about how to clean up the data.

1 Like

That was it! I went back to the file and cleaned it up, and re-imported it. All my data is now int and num values and my regression works!! Thank you for the tip!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.