Logistic regression - factor variables


I am currently working Logistic regression where the variable whether_worked (binary variable of 1/0) is explained by the categorical variable experience during the study period.

The explanatory variable is the categorical variable(work_experience) taking values:

  • mandatory internship
  • optional internship
  • volunteering
    -work in accordance with the field of study
  • work incompatible with the field of study
  • lack of experience

The question was in multiple choice form and written in 3 columns q5_1 q5_2 q5_3 so you could write out up to 3 options. How should I include this in the regression? When the dependent variable is explained by gender or education you can't be female and male at the same time you can't have college and high school education at the same time. In contrast, here someone has both mandatory internship and volounteering .

Currently it looks like this

glm(does_work~ work_experience, family ='binomial)

Unfortunately yes I do not include the answers from q5_2 and q5_3 . I'll add that q5_2 and q5_3 have 70% empty columns rather each gave one main, but still have additionally interesting information

Include one question for gender and one for education.

Main question is about type of work expierence during education where mainly anwsers are in p5_1 but some observation use also column p5_2 and p5_3 casue you could choose up to 3 how to use it ;D

If you want to include work experience include the first four possible work answers and drop lack of experience. Or maybe copy and paste here a short piece of your real data.

This is the 3 columns that are anwsers to question: What was your last work expierence during education period? You could just write an anwser from 6 posibilities in 5_1 and then 5_2 and 5_3 you could add also from the same posibilites more choices from the same 6 category. Most of people just anwser with 1 type of work but some of them add also info in 5_2 and 5_3 :smiley: And how can I use it in regression casue normally you someone is women or men somene has 1 level of education. But here you have multiple choices

This is unique(p5_1)

I rename it and put into regression

But with such an approach I lose information about p5_2 and p5_3

That is helpful. You are probably going to have to create new variables. Something like

mandatory <- p5_1 == "mandatory" |  p5_2 == "mandatory" | p5_3 == "mandatory" 
optional<- p5_1 == "optional" |  p5_2 == "optional" | p5_3 == "optional" 


(Sorry for putting it in English.)

By the way for future reference, it is usually much better to post the data as text which can be copied rather than posting a picture.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.