reference class failed

Hi you all, I have a question toward relevel the reference class in Logistic Regression. I used exactly the same codes as my professor giving but the reference class still not functioned as I set, anyone knows what is the reason for that?

Below are my code script. For some reason the forum does not accept my csv file, I did not mean to be rude, but if you guys could recognisize something, that would be great. Thanks a lot!

# install and acticate the following package
#      "regclass"
# R-Studio logistic regression with OrganicsData.csv data
OrganicsData <- read.csv
ogdata <- OrganicsData

summary(ogdata)
attach(ogdata)

## relevel DemGender & PromClass
DemGender <- relevel(DemGender, ref = "M")
PromClass <- relevel(PromClass, ref = "Silver")

## since the outcome variable TargetBuy is stored as integer
## convert the column into factors
TargetBuy <- as.factor(TargetBuy)

# fit a logistic regression model with the given set of variables
finalmodel <- glm(TargetBuy ~.-ï..ID,  
                  data = ogdata, family = binomial)

# View estimation results
summary(finalmodel)


# compute and display odds ratios
oddsratios <- exp(coef(finalmodel))
oddsratios
# predicting the probabilities for individual cases from model
predicted_probabilities <- predict(finalmodel, ogdata, type= "response")

View(predicted_probabilities)

## setting a threshold to then predict 1 or 0
threshold <- 0.5
predictions <- factor( ifelse(predicted_probabilities >= threshold, 1, 0))

odata <- ogdata
odata <- cbind(odata, predicted_probabilities, predictions)

View(odata)

# Confusion Matrix for the Logistic Regression model
# install and acticate the following package
#      regclass
## use the confusion_matrix() function from this package
confusion_matrix(finalmodel)

Hurried suggestion: take a look at this S/O post; may be a zero indexing issue

Hi technocrat,

Thanks a lot for the contributing.
I did follow the instructions to set the reference level. However, my question still is: I set "Male" as reference level and it is still existed in the final predicted output.
A bit different for PromClass, I set "Silver" as reference class. But at the end the predicted output referred to PromClassGold which I am very curious why that happens?

1 Like

Is osdata available online anywhere?

I only saw on this:
https://www.coursehero.com/file/p738ol9/The-ORGANICS-data-set-available-in-SAS-Metadata-Repository-in-library-AAEM-the/

Thanks,
Sophia

1 Like

Thanks, Sophia. Unfortunately, I can't unlock that without a SAS installation, which I don't have. I'll try to replicate a similar database and see what I can figure out.

Hi,

I have sent data file through your LinkedIn message. I really appreciate your help!

Good night,
Sophia

1 Like

Thanks. I would post this as a gist, so others can follow along, but I'm not sure of the IP restrictions.

Rather than attaching ogdata, I changed the values in place to avoid namespace clashes between the attached fields changes and ogdata, which does not reflect those changes, e.g.,

> class(TargetBuy)
[1] "factor"
> class(ogdata$TargetBuy)
[1] "integer"

To get to your question, however, I have to get past the specification of finalmodel

finalmodel <- glm(TargetBuy ~.-ï..ID,  
                  data = ogdata, family = binomial)

doesn't work, obviously. Do you mean

finalmodel <- glm(TargetBuy ~ DemGender + PromClass + ID,  
                  data = ogdata, family = binomial)

or

finalmodel <- glm(TargetBuy ~ .,  
                  data = ogdata, family = binomial)

which give quite different coefficients.

What I meant was predict TargetBuy with all the independent variables except ï..ID. that's why I put :
finalmodel <- glm(TargetBuy ~.-ï..ID,
data = ogdata, family = binomial)
"-ï..ID" means excluding that variable.

Thanks a lot,
Sophia

1 Like

Thanks, I was confused by the umlaut over i

Sure, I think it is R makes the change of the name of "ID" to "ï..ID" when I import the dataset.

1 Like

I'm going to be using

finalmodel <- glm(TargetBuy ~ DemAffl + DemAge + DemGender + PromClass+ PromSpend + PromTime,  data = ogdata, family = binomial)

for two reasons:

  1. From Zen of Python: explicit is better than implicit
  2. Including all except ID brings in TargetBuy on both sides

and I'm also using a function to add confidence bands to odds ratios

odr <- function(x) {
  exp(cbind(OR = coef(x), confint(x)))
}

Hi,

Looking cool!
My question is : is your output the same as you expected? The outputs of DemGender and PromClass won't show the base class which are "Male" & "Silver" respectively?

Thanks,
Sophia

OK, got it. The reference levels should be quoted as strings

ogdata <- ogdata %>% mutate(DemGender = relevel(DemGender, ref = "M"))
ogdata <- ogdata %>% mutate(PromClass = relevel(PromClass, ref = "Silver"))

So,

  1. Don't attach
  2. Don't include TargetBuy on both sides of the tilde ~ operator
  3. Quote the reference class names as strings

Before you go further into predict, review the concept of newdata.

Good luck!

Hi technocrat,

Gotcha! Thanks for the helping! Have an awesome Sunday!

Sophia