Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...)

Hi,

I am using the Insurance Company (TIC) Benchmark dataset. Data frame is 5822 obs. with 86 variables. Column description is in below link.

http://liacs.leidenuniv.nl/~puttenpwhvander/library/cc2000/data.html

I have uploaded the data from an csv file and except the dependent variable (purchase) which is in chr data type remaining variables are loaded as integer. Few e.g. below. Dataset name: data

AWAOREG : int 0 0 0 0 0 0 0 0 0 0 ... ABRAND : int 1 1 1 1 1 0 0 0 0 1 ...
AZEILPL : int 0 0 0 0 0 0 0 0 0 0 ... APLEZIER: int 0 0 0 0 0 0 0 0 0 0 ...
AFIETS : int 0 0 0 0 0 0 0 0 0 0 ... AINBOED : int 0 0 0 0 0 0 0 0 0 0 ...
ABYSTAND: int 0 0 0 0 0 0 0 0 0 0 ... Purchase: chr "No" "No" "No" "No" ...

I have converted the csv to xlsx file (us MS excel) and when loaded to R studio except purchase which is in chr data type remaining variables are loaded as numeric. Few e.g. below. Dataset name: my_data

AWAOREG : num [1:5822] 0 0 0 0 0 0 0 0 0 0 ... ABRAND : num [1:5822] 1 1 1 1 1 0 0 0 0 1 ...
AZEILPL : num [1:5822] 0 0 0 0 0 0 0 0 0 0 ... APLEZIER: num [1:5822] 0 0 0 0 0 0 0 0 0 0 ...
AFIETS : num [1:5822] 0 0 0 0 0 0 0 0 0 0 ... AINBOED : num [1:5822] 0 0 0 0 0 0 0 0 0 0 ...
ABYSTAND: num [1:5822] 0 0 0 0 0 0 0 0 0 0 ... Purchase: chr [1:5822] "No" "No" "No" "No" …

In either case, I am trying to run a simple linear model with below code and it is not working. There is no NA in the dataset.

lm_model = lm(Purchase~., data=my_data)
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
NA/NaN/Inf in 'y'
In addition: Warning message:
In storage.mode(v) <- "double" : NAs introduced by coercion

lm_model_1 = lm(Purchase~., data=data)
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
NA/NaN/Inf in 'y'
In addition: Warning message:
In storage.mode(v) <- "double" : NAs introduced by coercion

I need to create few other models using LR, NB, KNN and check for performance as well.

Any help is appreciated.

Start by eliminating the trips through csv and xlsx

library(ISLR)
my_data <- Caravan
str(my_data)
#> 'data.frame':    5822 obs. of  86 variables:
#>  $ MOSTYPE : num  33 37 37 9 40 23 39 33 33 11 ...
#>  $ MAANTHUI: num  1 1 1 1 1 1 2 1 1 2 ...
#>  $ MGEMOMV : num  3 2 2 3 4 2 3 2 2 3 ...
#>  $ MGEMLEEF: num  2 2 2 3 2 1 2 3 4 3 ...
#>  $ MOSHOOFD: num  8 8 8 3 10 5 9 8 8 3 ...
#>  $ MGODRK  : num  0 1 0 2 1 0 2 0 0 3 ...
#>  $ MGODPR  : num  5 4 4 3 4 5 2 7 1 5 ...
#>  $ MGODOV  : num  1 1 2 2 1 0 0 0 3 0 ...
#>  $ MGODGE  : num  3 4 4 4 4 5 5 2 6 2 ...
#>  $ MRELGE  : num  7 6 3 5 7 0 7 7 6 7 ...
#>  $ MRELSA  : num  0 2 2 2 1 6 2 2 0 0 ...
#>  $ MRELOV  : num  2 2 4 2 2 3 0 0 3 2 ...
#>  $ MFALLEEN: num  1 0 4 2 2 3 0 0 3 2 ...
#>  $ MFGEKIND: num  2 4 4 3 4 5 3 5 3 2 ...
#>  $ MFWEKIND: num  6 5 2 4 4 2 6 4 3 6 ...
#>  $ MOPLHOOG: num  1 0 0 3 5 0 0 0 0 0 ...
#>  $ MOPLMIDD: num  2 5 5 4 4 5 4 3 1 4 ...
#>  $ MOPLLAAG: num  7 4 4 2 0 4 5 6 8 5 ...
#>  $ MBERHOOG: num  1 0 0 4 0 2 0 2 1 2 ...
#>  $ MBERZELF: num  0 0 0 0 5 0 0 0 1 0 ...
#>  $ MBERBOER: num  1 0 0 0 4 0 0 0 0 0 ...
#>  $ MBERMIDD: num  2 5 7 3 0 4 4 2 1 3 ...
#>  $ MBERARBG: num  5 0 0 1 0 2 1 5 8 3 ...
#>  $ MBERARBO: num  2 4 2 2 0 2 5 2 1 3 ...
#>  $ MSKA    : num  1 0 0 3 9 2 0 2 1 1 ...
#>  $ MSKB1   : num  1 2 5 2 0 2 1 1 1 2 ...
#>  $ MSKB2   : num  2 3 0 1 0 2 4 2 0 1 ...
#>  $ MSKC    : num  6 5 4 4 0 4 5 5 8 4 ...
#>  $ MSKD    : num  1 0 0 0 0 2 0 2 1 2 ...
#>  $ MHHUUR  : num  1 2 7 5 4 9 6 0 9 0 ...
#>  $ MHKOOP  : num  8 7 2 4 5 0 3 9 0 9 ...
#>  $ MAUT1   : num  8 7 7 9 6 5 8 4 5 6 ...
#>  $ MAUT2   : num  0 1 0 0 2 3 0 4 2 1 ...
#>  $ MAUT0   : num  1 2 2 0 1 3 1 2 3 2 ...
#>  $ MZFONDS : num  8 6 9 7 5 9 9 6 7 6 ...
#>  $ MZPART  : num  1 3 0 2 4 0 0 3 2 3 ...
#>  $ MINKM30 : num  0 2 4 1 0 5 4 2 7 2 ...
#>  $ MINK3045: num  4 0 5 5 0 2 3 5 2 3 ...
#>  $ MINK4575: num  5 5 0 3 9 3 3 3 1 3 ...
#>  $ MINK7512: num  0 2 0 0 0 0 0 0 0 1 ...
#>  $ MINK123M: num  0 0 0 0 0 0 0 0 0 0 ...
#>  $ MINKGEM : num  4 5 3 4 6 3 3 3 2 4 ...
#>  $ MKOOPKLA: num  3 4 4 4 3 3 5 3 3 7 ...
#>  $ PWAPART : num  0 2 2 0 0 0 0 0 0 2 ...
#>  $ PWABEDR : num  0 0 0 0 0 0 0 0 0 0 ...
#>  $ PWALAND : num  0 0 0 0 0 0 0 0 0 0 ...
#>  $ PPERSAUT: num  6 0 6 6 0 6 6 0 5 0 ...
#>  $ PBESAUT : num  0 0 0 0 0 0 0 0 0 0 ...
#>  $ PMOTSCO : num  0 0 0 0 0 0 0 0 0 0 ...
#>  $ PVRAAUT : num  0 0 0 0 0 0 0 0 0 0 ...
#>  $ PAANHANG: num  0 0 0 0 0 0 0 0 0 0 ...
#>  $ PTRACTOR: num  0 0 0 0 0 0 0 0 0 0 ...
#>  $ PWERKT  : num  0 0 0 0 0 0 0 0 0 0 ...
#>  $ PBROM   : num  0 0 0 0 0 0 0 3 0 0 ...
#>  $ PLEVEN  : num  0 0 0 0 0 0 0 0 0 0 ...
#>  $ PPERSONG: num  0 0 0 0 0 0 0 0 0 0 ...
#>  $ PGEZONG : num  0 0 0 0 0 0 0 0 0 0 ...
#>  $ PWAOREG : num  0 0 0 0 0 0 0 0 0 0 ...
#>  $ PBRAND  : num  5 2 2 2 6 0 0 0 0 3 ...
#>  $ PZEILPL : num  0 0 0 0 0 0 0 0 0 0 ...
#>  $ PPLEZIER: num  0 0 0 0 0 0 0 0 0 0 ...
#>  $ PFIETS  : num  0 0 0 0 0 0 0 0 0 0 ...
#>  $ PINBOED : num  0 0 0 0 0 0 0 0 0 0 ...
#>  $ PBYSTAND: num  0 0 0 0 0 0 0 0 0 0 ...
#>  $ AWAPART : num  0 2 1 0 0 0 0 0 0 1 ...
#>  $ AWABEDR : num  0 0 0 0 0 0 0 0 0 0 ...
#>  $ AWALAND : num  0 0 0 0 0 0 0 0 0 0 ...
#>  $ APERSAUT: num  1 0 1 1 0 1 1 0 1 0 ...
#>  $ ABESAUT : num  0 0 0 0 0 0 0 0 0 0 ...
#>  $ AMOTSCO : num  0 0 0 0 0 0 0 0 0 0 ...
#>  $ AVRAAUT : num  0 0 0 0 0 0 0 0 0 0 ...
#>  $ AAANHANG: num  0 0 0 0 0 0 0 0 0 0 ...
#>  $ ATRACTOR: num  0 0 0 0 0 0 0 0 0 0 ...
#>  $ AWERKT  : num  0 0 0 0 0 0 0 0 0 0 ...
#>  $ ABROM   : num  0 0 0 0 0 0 0 1 0 0 ...
#>  $ ALEVEN  : num  0 0 0 0 0 0 0 0 0 0 ...
#>  $ APERSONG: num  0 0 0 0 0 0 0 0 0 0 ...
#>  $ AGEZONG : num  0 0 0 0 0 0 0 0 0 0 ...
#>  $ AWAOREG : num  0 0 0 0 0 0 0 0 0 0 ...
#>  $ ABRAND  : num  1 1 1 1 1 0 0 0 0 1 ...
#>  $ AZEILPL : num  0 0 0 0 0 0 0 0 0 0 ...
#>  $ APLEZIER: num  0 0 0 0 0 0 0 0 0 0 ...
#>  $ AFIETS  : num  0 0 0 0 0 0 0 0 0 0 ...
#>  $ AINBOED : num  0 0 0 0 0 0 0 0 0 0 ...
#>  $ ABYSTAND: num  0 0 0 0 0 0 0 0 0 0 ...
#>  $ Purchase: Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...

Created on 2020-08-25 by the reprex package (v0.3.0)

Then test with a toy model

library(ISLR)
my_data <- Caravan
fit <- lm(Purchase ~ AWAOREG + ABRAND, data = my_data)
#> Warning in model.response(mf, "numeric"): using type = "numeric" with a factor
#> response will be ignored
#> Warning in Ops.factor(y, z$residuals): '-' not meaningful for factors
summary(fit)
#> Warning in Ops.factor(r, 2): '^' not meaningful for factors
#> 
#> Call:
#> lm(formula = Purchase ~ AWAOREG + ABRAND, data = my_data)
#> 
#> Residuals:
#> Error in quantile.default(resid): factors are not allowed

Created on 2020-08-25 by the reprex package (v0.3.0)

Notice the warning regarding factors.

Notice also that Y in Y ~ X_i ... X_n is binary, not continuous.

Compare

library(ISLR)
my_data <- Caravan
fit <- glm(Purchase ~ AWAOREG + ABRAND, data = my_data, family = "binomial")
summary(fit)
#> 
#> Call:
#> glm(formula = Purchase ~ AWAOREG + ABRAND, family = "binomial", 
#>     data = my_data)
#> 
#> Deviance Residuals: 
#>     Min       1Q   Median       3Q      Max  
#> -1.1338  -0.3774  -0.3774  -0.3082   2.4782  
#> 
#> Coefficients:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept) -3.02326    0.08365 -36.140  < 2e-16 ***
#> AWAOREG      0.64537    0.46204   1.397    0.162    
#> ABRAND       0.41711    0.09058   4.605 4.13e-06 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 2635.5  on 5821  degrees of freedom
#> Residual deviance: 2613.3  on 5819  degrees of freedom
#> AIC: 2619.3
#> 
#> Number of Fisher Scoring iterations: 5

Created on 2020-08-25 by the reprex package (v0.3.0)

The NA that will likely still appear in the fully saturated model, with all the available variables, may reflect not the presence of NA values in my_data but in collinearities in the test statistic.

The analyst has a choice among three principal alternatives in designing a model with multiple variables:

  1. Step-forward addition
  2. Step-backward subtraction
  3. Domain knowledge informed selected

Whichever method is chosen, the focus should be on improving the value of the residual deviance and AIC or another, similar method for a predetermined value of \alpha (to avoid p hacking)

Following that, goodness of fit should be evaluated. See my post for more, including log likelihood and its role in logistic regression. Also, pay special attention to Fig. 1.1 for what to expect when using lm with a binary response variable.

1 Like

Hi, Thanks much for the response.

Regargding the xls/csv route --> thanks for above approach. It helps.

Before I create models, I need to check for multicollinearity and its the very reason why I have tried lm(). With this I can find the vif in the model and conclude if there is any multicollinearity effect.

If i try to find the correlation, I am getting getting below error. Data type after importing the dataset from ISLR package shows all numeric except target variable. And lm() function below error.

Not sure on how to resolve this. I have tried with other dataset and I was able to proceed finding the lm().

cor(my_data)
Error in cor(my_data) : 'x' must be numeric
lm_model = lm(Purchase~., data=my_data)
Warning messages:
1: In model.response(mf, "numeric") :
using type = "numeric" with a factor response will be ignored
2: In Ops.factor(y, z$residuals) : ‘-’ not meaningful for factors

R has many strengths, such as functional programming, but the greatest is that it was created by statisticians, for statisticians. It's safe to assert without checking that there exists one or more well tested functions for every standard statistical problem. Two useful ways to find them are rseek.org and CRAN task views.

vif{car}` provides a solution.

suppressPackageStartupMessages({library(ISLR)
                               library(car)})
my_data <- Caravan
fit <- glm(Purchase ~ ., data = my_data, family = "binomial")
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(fit)
#> 
#> Call:
#> glm(formula = Purchase ~ ., family = "binomial", data = my_data)
#> 
#> Deviance Residuals: 
#>     Min       1Q   Median       3Q      Max  
#> -1.7047  -0.3711  -0.2450  -0.1588   3.2916  
#> 
#> Coefficients:
#>               Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  2.542e+02  1.116e+04   0.023  0.98183    
#> MOSTYPE      6.580e-02  4.624e-02   1.423  0.15468    
#> MAANTHUI    -1.832e-01  1.927e-01  -0.951  0.34157    
#> MGEMOMV     -2.696e-02  1.399e-01  -0.193  0.84723    
#> MGEMLEEF     2.096e-01  1.016e-01   2.063  0.03911 *  
#> MOSHOOFD    -2.767e-01  2.076e-01  -1.333  0.18247    
#> MGODRK      -1.142e-01  1.069e-01  -1.068  0.28535    
#> MGODPR      -1.910e-02  1.177e-01  -0.162  0.87112    
#> MGODOV      -1.618e-02  1.055e-01  -0.153  0.87818    
#> MGODGE      -6.817e-02  1.113e-01  -0.612  0.54024    
#> MRELGE       2.310e-01  1.566e-01   1.475  0.14031    
#> MRELSA       8.509e-02  1.466e-01   0.580  0.56169    
#> MRELOV       1.467e-01  1.562e-01   0.939  0.34759    
#> MFALLEEN    -8.291e-02  1.311e-01  -0.633  0.52702    
#> MFGEKIND    -1.154e-01  1.337e-01  -0.863  0.38813    
#> MFWEKIND    -8.140e-02  1.417e-01  -0.575  0.56561    
#> MOPLHOOG     9.717e-04  1.311e-01   0.007  0.99408    
#> MOPLMIDD    -9.077e-02  1.365e-01  -0.665  0.50605    
#> MOPLLAAG    -1.994e-01  1.376e-01  -1.449  0.14740    
#> MBERHOOG     8.883e-02  9.349e-02   0.950  0.34204    
#> MBERZELF     3.918e-02  9.897e-02   0.396  0.69219    
#> MBERBOER    -1.169e-01  1.104e-01  -1.059  0.28951    
#> MBERMIDD     1.353e-01  9.191e-02   1.472  0.14106    
#> MBERARBG     3.976e-02  9.067e-02   0.438  0.66104    
#> MBERARBO     9.954e-02  9.143e-02   1.089  0.27628    
#> MSKA         2.690e-02  1.035e-01   0.260  0.79502    
#> MSKB1       -8.801e-03  1.011e-01  -0.087  0.93064    
#> MSKB2        1.200e-02  9.081e-02   0.132  0.89485    
#> MSKC         9.016e-02  9.958e-02   0.905  0.36527    
#> MSKD        -2.468e-02  9.724e-02  -0.254  0.79967    
#> MHHUUR      -1.472e+01  8.140e+02  -0.018  0.98557    
#> MHKOOP      -1.469e+01  8.140e+02  -0.018  0.98561    
#> MAUT1        1.819e-01  1.514e-01   1.202  0.22953    
#> MAUT2        1.507e-01  1.371e-01   1.099  0.27162    
#> MAUT0        9.325e-02  1.436e-01   0.649  0.51603    
#> MZFONDS     -1.445e+01  9.359e+02  -0.015  0.98768    
#> MZPART      -1.451e+01  9.359e+02  -0.016  0.98763    
#> MINKM30      1.181e-01  1.006e-01   1.174  0.24039    
#> MINK3045     1.366e-01  9.650e-02   1.415  0.15694    
#> MINK4575     1.009e-01  9.667e-02   1.043  0.29678    
#> MINK7512     1.144e-01  1.027e-01   1.114  0.26513    
#> MINK123M    -1.607e-01  1.449e-01  -1.109  0.26738    
#> MINKGEM      9.214e-02  9.945e-02   0.927  0.35417    
#> MKOOPKLA     6.856e-02  4.642e-02   1.477  0.13966    
#> PWAPART      5.954e-01  3.901e-01   1.526  0.12693    
#> PWABEDR     -2.757e-01  4.635e-01  -0.595  0.55196    
#> PWALAND     -4.405e-01  1.035e+00  -0.425  0.67052    
#> PPERSAUT     2.306e-01  4.199e-02   5.491 4.01e-08 ***
#> PBESAUT      1.215e+01  4.029e+02   0.030  0.97595    
#> PMOTSCO     -8.101e-02  1.147e-01  -0.706  0.48006    
#> PVRAAUT     -2.106e+00  2.557e+03  -0.001  0.99934    
#> PAANHANG     1.014e+00  9.371e-01   1.082  0.27917    
#> PTRACTOR     7.229e-01  4.278e-01   1.690  0.09107 .  
#> PWERKT      -5.525e+00  4.805e+03  -0.001  0.99908    
#> PBROM        2.170e-01  4.865e-01   0.446  0.65559    
#> PLEVEN      -2.382e-01  1.170e-01  -2.036  0.04173 *  
#> PPERSONG    -4.523e-01  2.094e+00  -0.216  0.82901    
#> PGEZONG      1.444e+00  1.029e+00   1.404  0.16033    
#> PWAOREG      8.239e-01  5.943e-01   1.386  0.16565    
#> PBRAND       2.401e-01  7.714e-02   3.113  0.00185 ** 
#> PZEILPL     -8.658e+00  3.261e+03  -0.003  0.99788    
#> PPLEZIER    -1.886e-01  3.259e-01  -0.579  0.56289    
#> PFIETS       3.664e-01  8.325e-01   0.440  0.65985    
#> PINBOED     -1.068e+00  8.764e-01  -1.219  0.22301    
#> PBYSTAND    -1.676e-01  3.321e-01  -0.505  0.61373    
#> AWAPART     -9.293e-01  7.802e-01  -1.191  0.23364    
#> AWABEDR      4.197e-01  1.082e+00   0.388  0.69824    
#> AWALAND      2.762e-01  3.528e+00   0.078  0.93758    
#> APERSAUT    -3.902e-02  1.772e-01  -0.220  0.82566    
#> ABESAUT     -7.298e+01  2.417e+03  -0.030  0.97591    
#> AMOTSCO      2.418e-01  3.772e-01   0.641  0.52142    
#> AVRAAUT     -4.490e+00  1.078e+04   0.000  0.99967    
#> AAANHANG    -1.351e+00  1.687e+00  -0.801  0.42322    
#> ATRACTOR    -2.376e+00  1.524e+00  -1.559  0.11899    
#> AWERKT      -8.749e-01  9.682e+03   0.000  0.99993    
#> ABROM       -1.060e+00  1.549e+00  -0.684  0.49367    
#> ALEVEN       4.789e-01  2.245e-01   2.133  0.03291 *  
#> APERSONG     3.997e-01  4.329e+00   0.092  0.92644    
#> AGEZONG     -3.163e+00  2.706e+00  -1.169  0.24247    
#> AWAOREG     -3.212e+00  3.433e+00  -0.936  0.34939    
#> ABRAND      -4.118e-01  2.787e-01  -1.477  0.13956    
#> AZEILPL      1.047e+01  3.261e+03   0.003  0.99744    
#> APLEZIER     2.516e+00  1.010e+00   2.490  0.01276 *  
#> AFIETS       2.318e-01  5.699e-01   0.407  0.68420    
#> AINBOED      1.947e+00  1.412e+00   1.378  0.16812    
#> ABYSTAND     1.078e+00  1.103e+00   0.977  0.32870    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 2635.5  on 5821  degrees of freedom
#> Residual deviance: 2243.5  on 5736  degrees of freedom
#> AIC: 2415.5
#> 
#> Number of Fisher Scoring iterations: 17
vif(fit)
#>      MOSTYPE     MAANTHUI      MGEMOMV     MGEMLEEF     MOSHOOFD       MGODRK 
#> 1.176526e+02 1.129405e+00 3.502960e+00 1.843806e+00 1.159760e+02 3.254979e+00 
#>       MGODPR       MGODOV       MGODGE       MRELGE       MRELSA       MRELOV 
#> 1.149987e+01 3.403489e+00 9.654770e+00 2.092480e+01 4.738550e+00 1.755829e+01 
#>     MFALLEEN     MFGEKIND     MFWEKIND     MOPLHOOG     MOPLMIDD     MOPLLAAG 
#> 1.396958e+01 1.484774e+01 2.401910e+01 1.697427e+01 1.707886e+01 2.881044e+01 
#>     MBERHOOG     MBERZELF     MBERBOER     MBERMIDD     MBERARBG     MBERARBO 
#> 1.005929e+01 2.005450e+00 2.032457e+00 9.340159e+00 6.998443e+00 6.263894e+00 
#>         MSKA        MSKB1        MSKB2         MSKC         MSKD       MHHUUR 
#> 1.143798e+01 5.439495e+00 5.554214e+00 1.173789e+01 3.231966e+00 1.742645e+09 
#>       MHKOOP        MAUT1        MAUT2        MAUT0      MZFONDS       MZPART 
#> 1.742645e+09 1.434527e+01 8.048257e+00 1.242105e+01 1.102943e+09 1.102943e+09 
#>      MINKM30     MINK3045     MINK4575     MINK7512     MINK123M      MINKGEM 
#> 1.022040e+01 1.002438e+01 9.896905e+00 5.441998e+00 1.649801e+00 5.338355e+00 
#>     MKOOPKLA      PWAPART      PWABEDR      PWALAND     PPERSAUT      PBESAUT 
#> 2.649655e+00 4.429193e+01 6.632572e+00 3.321309e+01 3.113930e+00 9.398932e+06 
#>      PMOTSCO      PVRAAUT     PAANHANG     PTRACTOR       PWERKT        PBROM 
#> 3.450522e+00 5.470096e+01 1.397225e+01 1.333457e+01 1.707055e+02 1.778348e+01 
#>       PLEVEN     PPERSONG      PGEZONG      PWAOREG       PBRAND      PZEILPL 
#> 4.287040e+00 1.688445e+01 2.897539e+01 3.110765e+01 6.246955e+00 2.891237e+06 
#>     PPLEZIER       PFIETS      PINBOED     PBYSTAND      AWAPART      AWABEDR 
#> 6.860219e+00 7.914642e+00 6.966391e+00 1.333206e+01 4.493341e+01 5.984836e+00 
#>      AWALAND     APERSAUT      ABESAUT      AMOTSCO      AVRAAUT     AAANHANG 
#> 3.329385e+01 3.236914e+00 9.398929e+06 3.397310e+00 5.470096e+01 1.391263e+01 
#>     ATRACTOR       AWERKT        ABROM       ALEVEN     APERSONG      AGEZONG 
#> 1.327892e+01 1.707055e+02 1.775902e+01 4.383663e+00 1.691406e+01 2.912335e+01 
#>      AWAOREG       ABRAND      AZEILPL     APLEZIER       AFIETS      AINBOED 
#> 3.115261e+01 5.907867e+00 2.891237e+06 6.611109e+00 7.924550e+00 6.980250e+00 
#>     ABYSTAND 
#> 1.320138e+01

Created on 2020-08-26 by the reprex package (v0.3.0)

Thanks for the details. This has helped and able to progress in creating multiple models.

Based on vif, I had to remove more than half of the columns. I tried creating a new data frame with only required variable with below command. It didn't work.

> my_data_new = my_data %>% select(- "MOSTYPE", - "MOSHOOFD", - "MGODPR", - "MRELGE", - "MFWEKIND", - "MOPLLAAG", - "MBERHOOG", "MBERMIDD", - "MSKA", - "MSKC", - "MHHUUR", - "MHKOOP", - "MAUT1",
+                                  - "MAUT0", - "MZFONDS", - "MZPART", - "MINKM30", - "MINK3045", - "MINK4575", - "PWAPART", - "PWALAND", - "PBESAUT", - "PVRAAUT", - "PAANHANG", - "PTRACTOR",
+                                  - "PWERKT", - "PBROM", - "PPERSONG", - "PGEZONG", - "PWAOREG", - "PBRAND", - "PZEILPL", - "PPLEZIER", - "PFIETS", "PBYSTAND", - "AWAPART", - "AWALAND", 
+                                  - "ABESAUT", - "AVRAAUT", - "AAANHANG", - "ATRACTOR", - "AWERKT", - "ABROM", - "APERSONG", - "AGEZONG", - "AWAOREG", - "AZEILPL", - "AFIETS", - "AINBOED")
Error in select(., -"MOSTYPE", -"MOSHOOFD", -"MGODPR", -"MRELGE", -"MFWEKIND",  : 
  unused arguments (-"MOSTYPE", -"MOSHOOFD", -"MGODPR", -"MRELGE", -"MFWEKIND", -"MOPLLAAG", -"MBERHOOG", "MBERMIDD", -"MSKA", -"MSKC", -"MHHUUR", -"MHKOOP", -"MAUT1", -"MAUT0", -"MZFONDS", -"MZPART", -"MINKM30", -"MINK3045", -"MINK4575", -"PWAPART", -"PWALAND", -"PBESAUT", -"PVRAAUT", -"PAANHANG", -"PTRACTOR", -"PWERKT", -"PBROM", -"PPERSONG", -"PGEZONG", -"PWAOREG", -"PBRAND", -"PZEILPL", -"PPLEZIER", -"PFIETS", "PBYSTAND", -"AWAPART", -"AWALAND", -"ABESAUT", -"AVRAAUT", -"AAANHANG", -"ATRACTOR", -"AWERKT", -"ABROM", -"APERSONG", -"AGEZONG", -"AWAOREG", -"AZEILPL", -"AFIETS", -"AINBOED")

are you sure you are using dplyr::select and not select from another namespace ?

thanks.. I have resolved this issue.. minor typo's...

arguments should not be quoted. one way to do this is to subset the data frame on the basis of vif to a data frame, call it exclusions

suppressPackageStartupMessages({library(dplyr)
  library(ISLR)
  library(car)})
my_data <- Caravan
exclusions <- colnames(my_data)[1:3] #for illustration
my_data %>% select(!all_of(exclusions)) %>% head()  %>% .[1:5]
#>   MGEMLEEF MOSHOOFD MGODRK MGODPR MGODOV
#> 1        2        8      0      5      1
#> 2        2        8      1      4      1
#> 3        2        8      0      4      2
#> 4        3        3      2      3      2
#> 5        2       10      1      4      1
#> 6        1        5      0      5      0

Created on 2020-09-01 by the reprex package (v0.3.0)

Thanks @technocrat .. took similar approach to resolve..

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.