Creating a scatter plot and the lm function.

Haleylololo · May 30, 2020, 7:15pm

An error message has come up when i am trying to use the lm function

Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
NA/NaN/Inf in 'y'
In addition: Warning message:
In storage.mode(v) <- "double" : NAs introduced by coercion

This is my code:

rm(list=ls())
COVID19_DATA_3 <- read.csv("C:/Users/User.DESKTOP-IVGA5BC/Desktop/COVID19_DATA_3.csv", header=FALSE)
COVID19_DATA_3$V8=as.character(COVID19_DATA_3$V8)
COVID19_DATA_3$V11=as.character(COVID19_DATA_3$V11)

plot(COVID19_DATA_3$V11,COVID19_DATA_3$V8)
plot(COVID19_DATA_3$V11,COVID19_DATA_3$V8, ylim=c(0,2000))
plot(COVID19_DATA_3$V11,COVID19_DATA_3$V8, xlim=c(0,50),ylim=c(0,500))

fit <-lm(COVID19_DATA_3$V11~COVID19_DATA_3$V8)
plot(COVID19_DATA_3$V11,COVID19_DATA_3$V8)
abline(fit, col = "blue", lwd=1)

My data consists of column V8 and V11 and it consists of ONLY integers and NA values. Not sure if i use the wrong function 'as.character'. But i tried without using it and it turns out that the scatter plot gives a series of lines instead of dots.

This is a brief section of how my data look like. (V8 and V11 only has integer values and NA values)

Referred here from support.rstudio.com

FJCC · May 30, 2020, 9:47pm

I see a few problems.

In your call to read.csv, you set header = FALSE but the image of your data shows text in the first row. I suggest you edit the original csv file to make the header text useful, with no spaces in the headers, and then set header = TRUE in read.csv(). Alternatively, you can remove the headers from the csv file and leave the call to read.csv as it is. If you keep the headers, you will have to edit your later code to refer to the new column names instead of V8 and V11
The use of as.character will spoil the regression. After you fix the header problem, you should delete the as.character lines.
Your plot() calls put V11 on the x axis and V8 on the y axis but your call to lm() uses COVID19_DATA_3$V11~COVID19_DATA_3$V8 which defines V11 as the dependent variable, as if it were on the y axis. I think you want to make that COVID19_DATA_3$V8~COVID19_DATA_3$V11

Haleylololo · May 31, 2020, 9:29am

That works. Thanks very much!!!

system · June 21, 2020, 9:29am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.