Multiple linear regression with many independant variables

After the code:

library(caTools)
sample.split(temperature$Month,SplitRatio = 0.65)-> split_model
subset(temperature, split_model==T)->train
subset(temperature, split_model==F)->test
nrow(train)

It works,
but then when I enter:

lm(Month~Country+Region+City+AvgTemperature,data=train)-> mod1
Error in contrasts<-(*tmp*, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels
I does not work and I cleaned up the RAM;

Please could you help me?

Thank you
Sara

Welcome to the R Studio Community!

Did you mean to use Month as the dependent variable and AvgTemperature as a feature? That looks backwards.

Thank you,

For this model, take Country, Region, City, and AvgTemperature as the independent variable and Month as the dependent variable.

What is the code for this model, the code that I sent you earlier does not work. It says:Error in contrasts<-(*tmp*, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels
Please could you help me?

Thank you
Sara

Hi Sara,
Please can you put

str(train)

into your console and share with us the result of that ?

sample.split(temperature2$Month,SplitRatio = 0.65)-> split_model
Error in sample.split(temperature2$Month, SplitRatio = 0.65) :
could not find function "sample.split"

> str(train)
'data.frame':	681573 obs. of  8 variables:
 $ Region        : chr  "Africa" "Africa" "Africa" "Africa" ...
 $ Country       : chr  "Algeria" "Algeria" "Algeria" "Algeria" ...
 $ State         : chr  NA NA NA NA ...
 $ City          : chr  "Algiers" "Algiers" "Algiers" "Algiers" ...
 $ Month         : chr  "1" "1" "1" "1" ...
 $ Day           : chr  "1" "2" "3" "5" ...
 $ Year          : chr  "1995" "1995" "1995" "1995" ...
 $ AvgTemperature: chr  "64.2" "49.4" "48.8" "47.9" ...

Now I have one more error on sample.split?

SARA

sample.split is only available to you when you have run library(caTools)

From your str it shows that Month is a character and there is not a principled way for lm() to regress / fit a model to predict that directly. you could potentially (As a purely practical matter) convert to an integer, although I am dubious as to the statistical validity of such an approach. Can you give us some more information about the context that brought you to this analysis, and tell a bit of what your goal is ?

This means that one of your non-numeric independent variables only has a single unique value. You should remove that form the formula.

Thank you for your answer it finally work out.

I have another issue. I am trying to run multiplunear logistic regression and it put this error:

glm(gender~Dependents, data=train, family = "binomial")-> log_mod_multi
Error in eval(family$initialize) : y values must be 0 <= y <= 1

I convert everything as an integer and still does not work.

Please can you help?

THank you

sARA

factors are numbered from 1 on up internally, which you should be aware of if you want to manually convert to a numeric represenation (which is not usually necessary).

train <- data.frame(
  gender = rep(c("F","M"),20),
  Dependents = round(runif(n=40,max=10),digits=0)
)


str(train)
glm(gender~Dependents, data=train, family = "binomial")-> log_mod_multi_1


train$gender <- factor(train$gender)

str(train)
glm(gender~Dependents, data=train, family = "binomial")-> log_mod_multi_2


train$gender <- as.integer(train$gender)
str(train)

glm(gender~Dependents, data=train, family = "binomial")-> log_mod_multi_3


train$gender <- as.integer(train$gender) -1
str(train)

glm(gender~Dependents, data=train, family = "binomial")-> log_mod_multi_4

It does not work:

> train <- data.frame(gender = rep(c("F","M"),20)
+ Dependents = round(runif(n=40,max=10),digits=0)
Error: unexpected symbol in:
"train <- data.frame(gender = rep(c("F","M"),20)
Dependents"

Also Id like to run the independant variables at the same time.
Please could you help?
Thank you
SARA

did you copy and paste my script or type it out yourself? I think you omitted a comma I have.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.