Hi,
I am using the Insurance Company (TIC) Benchmark dataset. Data frame is 5822 obs. with 86 variables. Column description is in below link.
http://liacs.leidenuniv.nl/~puttenpwhvander/library/cc2000/data.html
I have uploaded the data from an csv file and except the dependent variable (purchase) which is in chr data type remaining variables are loaded as integer. Few e.g. below. Dataset name: data
AWAOREG : int 0 0 0 0 0 0 0 0 0 0 ...
ABRAND : int 1 1 1 1 1 0 0 0 0 1 ...
AZEILPL : int 0 0 0 0 0 0 0 0 0 0 ...
APLEZIER: int 0 0 0 0 0 0 0 0 0 0 ...
AFIETS : int 0 0 0 0 0 0 0 0 0 0 ...
AINBOED : int 0 0 0 0 0 0 0 0 0 0 ...
ABYSTAND: int 0 0 0 0 0 0 0 0 0 0 ...
Purchase: chr "No" "No" "No" "No" ...
I have converted the csv to xlsx file (us MS excel) and when loaded to R studio except purchase which is in chr data type remaining variables are loaded as numeric. Few e.g. below. Dataset name: my_data
AWAOREG : num [1:5822] 0 0 0 0 0 0 0 0 0 0 ...
ABRAND : num [1:5822] 1 1 1 1 1 0 0 0 0 1 ...
AZEILPL : num [1:5822] 0 0 0 0 0 0 0 0 0 0 ...
APLEZIER: num [1:5822] 0 0 0 0 0 0 0 0 0 0 ...
AFIETS : num [1:5822] 0 0 0 0 0 0 0 0 0 0 ...
AINBOED : num [1:5822] 0 0 0 0 0 0 0 0 0 0 ...
ABYSTAND: num [1:5822] 0 0 0 0 0 0 0 0 0 0 ...
Purchase: chr [1:5822] "No" "No" "No" "No" …
In either case, I am trying to run a simple linear model with below code and it is not working. There is no NA in the dataset.
lm_model = lm(Purchase~., data=my_data)
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
NA/NaN/Inf in 'y'
In addition: Warning message:
In storage.mode(v) <- "double" : NAs introduced by coercion
lm_model_1 = lm(Purchase~., data=data)
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
NA/NaN/Inf in 'y'
In addition: Warning message:
In storage.mode(v) <- "double" : NAs introduced by coercion
I need to create few other models using LR, NB, KNN and check for performance as well.
Any help is appreciated.