Creating a scatter plot and the lm function.

An error message has come up when i am trying to use the lm function

Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
NA/NaN/Inf in 'y'
In addition: Warning message:
In storage.mode(v) <- "double" : NAs introduced by coercion

This is my code:

rm(list=ls())
COVID19_DATA_3 <- read.csv("C:/Users/User.DESKTOP-IVGA5BC/Desktop/COVID19_DATA_3.csv", header=FALSE)
COVID19_DATA_3$V8=as.character(COVID19_DATA_3$V8)
COVID19_DATA_3$V11=as.character(COVID19_DATA_3$V11)

plot(COVID19_DATA_3$V11,COVID19_DATA_3$V8)
plot(COVID19_DATA_3$V11,COVID19_DATA_3$V8, ylim=c(0,2000))
plot(COVID19_DATA_3$V11,COVID19_DATA_3$V8, xlim=c(0,50),ylim=c(0,500))

fit <-lm(COVID19_DATA_3$V11~COVID19_DATA_3$V8)
plot(COVID19_DATA_3$V11,COVID19_DATA_3$V8)
abline(fit, col = "blue", lwd=1)

My data consists of column V8 and V11 and it consists of ONLY integers and NA values. Not sure if i use the wrong function 'as.character'. But i tried without using it and it turns out that the scatter plot gives a series of lines instead of dots.

This is a brief section of how my data look like. (V8 and V11 only has integer values and NA values)


Referred here from support.rstudio.com

I see a few problems.

  1. In your call to read.csv, you set header = FALSE but the image of your data shows text in the first row. I suggest you edit the original csv file to make the header text useful, with no spaces in the headers, and then set header = TRUE in read.csv(). Alternatively, you can remove the headers from the csv file and leave the call to read.csv as it is. If you keep the headers, you will have to edit your later code to refer to the new column names instead of V8 and V11
  2. The use of as.character will spoil the regression. After you fix the header problem, you should delete the as.character lines.
  3. Your plot() calls put V11 on the x axis and V8 on the y axis but your call to lm() uses COVID19_DATA_3$V11~COVID19_DATA_3$V8 which defines V11 as the dependent variable, as if it were on the y axis. I think you want to make that COVID19_DATA_3$V8~COVID19_DATA_3$V11

That works. Thanks very much!!!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.