Arguments imply differing number of rows. data.frame() Error


#1

Hello,

> xfactors<-model.matrix(Class ~ Sales,data=down_train)[,-1]
> x = as.matrix(data.frame(cust_prog_level,CUST_REGION_DESCR,xfactors, data = down_train))
Error in data.frame(cust_prog_level, CUST_REGION_DESCR, xfactors, data = down_train) : 
  arguments imply differing number of rows: 991205, 377240

> x = as.matrix(data.frame(cust_prog_level,CUST_REGION_DESCR,xfactors, data = down_train))

Error in data.frame(cust_prog_level, CUST_REGION_DESCR, xfactors, data = down_train) : 
  arguments imply differing number of rows: 991205, 377240

> NROW(down_train$cust_prog_level)
[1] 377240
> NROW(xfactors)
[1] 377240
> NROW(down_train$CUST_REGION_DESCR)
[1] 377240

How can I solve this error?


#2

I am not sure why you have a data= in your data.frame call. data.frame() does not have a data argument. Try:

x = as.matrix(data.frame(down_train$cust_prog_level, down_train$CUST_REGION_DESCR, xfactors))

#3

@cderv
Hello Chris,

> xfactors<-model.matrix(Class ~ CUST_REGION_DESCR + cust_prog_level,data=down_train)[,-1]
> x = as.matrix(data.frame(down_train$Sales,xfactors))
> glmmod = glmnet(x,y=as.factor(Class),alpha=1,family='binomial')
Error in is.factor(x) : object 'Class' not found

Somehow, "Class", which is my binomial variable, disappears.
I am trying to build a logistic regression model. The glmnet() package requires me to input a matrix which explain why I called as.matrix() function.
I follow a post on Stack Overflow.
The author of the post said there was not error when he or she ran glmnet() but I am having issue wit that.

How can I approach this error? It is obvious that object "Class" not found since when I called head(x), I have:

> head(x)
  down_train.Sales CUST_REGION_DESCRMOUNTAIN.WEST.REGION CUST_REGION_DESCRNORTH.CENTRAL.REGION
1             5.52                                     0                                     0
2           172.50                                     0                                     1
3            72.00                                     0                                     0
4            30.94                                     0                                     0
5           314.70                                     0                                     0
6           157.35                                     0                                     0
  CUST_REGION_DESCRNORTH.EAST.REGION CUST_REGION_DESCROHIO.VALLEY.REGION
1                                  0                                   1
2                                  0                                   0
3                                  0                                   0
4                                  0                                   0
5                                  0                                   0
6                                  0                                   1
  CUST_REGION_DESCRSOUTH.CENTRAL.REGION CUST_REGION_DESCRSOUTH.EAST.REGION CUST_REGION_DESCRWESTERN.REGION
1                                     0                                  0                               0
2                                     0                                  0                               0
3                                     1                                  0                               0
4                                     1                                  0                               0
5                                     0                                  0                               1
6                                     0                                  0                               0
  cust_prog_levelC cust_prog_levelD cust_prog_levelE cust_prog_levelG cust_prog_levelI cust_prog_levelL
1                0                0                0                0                0                0
2                0                0                0                0                0                0
3                0                0                0                0                0                0
4                0                0                0                0                0                0
5                1                0                0                0                0                0
6                0                0                0                0                0                0
  cust_prog_levelM cust_prog_levelN cust_prog_levelP cust_prog_levelR cust_prog_levelS cust_prog_levelX
1                0                1                0                0                0                0
2                0                0                1                0                0                0
3                0                1                0                0                0                0
4                0                1                0                0                0                0
5                0                0                0                0                0                0
6                0                0                1                0                0                0
  cust_prog_levelZ
1                0
2                0
3                0
4                0
5                0
6                0

Thanks!
@jcblum


#4

I’ll add that the call to data.frame should have failed for a different reason — that neither of those first 2 objects could be found. That it didn’t implies that you have either created separate variables with those names at some point (confusing and therefore dangerous!) or attach()ed your down_train data frame at some point (also confusing and therefore dangerous!). Might be a good time to restart your R session (assuming you know how to recreate any important objects — and if not, that’s another important problem to solve!).


#5

R doesn't know where to find Class, since it's a variable in a data frame, but you haven't told glmnet that. To refer to Class in this context, you need to specify it as down_train$Class.

This is admittedly a confusing area because some functions in R (and in R packages) allow shortcuts via what's called Non-Standard Evaluation, so that you don't always have to fully specify variables from a data frame. One case you've already seen: when a function takes a formula and a data parameter, then the variables in the formula do not have to be fully specified. But that's a special situation, not the default.


#6

@jcblum
I am sorry I don't know your name so I will call you jcblumn for now.
Additional help: where to find a good laid-back explanation of glmnet() package?
For instance, I want to understand what the message glmnet() delivers?
Maybe a source with an example, result, interpretation?

I have found several sources from stack overflow, Stanford website(https://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html)
but I still don't understand how it can be classifed as "Logistic Regression".

Or maybe we should all look at my model and help me understand the result.

> xfactors<-model.matrix(Class ~ CUST_REGION_DESCR + cust_prog_level,data=down_train)
> x = as.matrix(data.frame(down_train$Sales,xfactors))
> cvfit = cv.glmnet(x, y=as.factor(down_train$Class), alpha=1, family="binomial")
> summary(cvfit)
           Length Class  Mode     
lambda     58     -none- numeric  
cvm        58     -none- numeric  
cvsd       58     -none- numeric  
cvup       58     -none- numeric  
cvlo       58     -none- numeric  
nzero      58     -none- numeric  
name        1     -none- character
glmnet.fit 13     lognet list     
lambda.min  1     -none- numeric  
lambda.1se  1     -none- numeric  
call        5     -none- call   
  1. I think "cv" stands for "cross validation". What does it do in the Logistic Regression Model?

  2. I know the lm() package builds a linear regression model. Does glmnet() also build a regression model that helps to predict whether the predict "Class" responds to the Independent Variables on the right hand side?

glmmod = glmnet(x,y=as.factor(down_train$Class),alpha=1,family='binomial')
  1. What does "alpha = 1" mean?

Thanks all! @cderv