Am I correct on the following?
- There are 5 possibilities for the
Behaviour column - AE, AI, PO, PA, and PI. You want to group AE and AI as "active" and PO, PA, and PI as "passive"
- There are 7 possibilities for the
Name columns -- Cont 1, Cont 5, Cont 9, LPD 11, LPD 6, LPD 7, LPD 8. You already grouped them into Cont and LPD in the Condition column.
- You want to build a regression to predict whether or not
Behaviour is active or passive, using Hour, Minute, Condition, and days after birth
If that is correct, I'd suggest the following:
First, create an indicator column about whether behaviour is active -- the output should be 0 or 1
mb <- mb %>%
mutate(is_active = if_else(Behaviour %in% active_behaviour, 1, 0))
Also, make an indicator variable for is condition Cont or LPD
mb <- mb %>%
mutate(is_cont = if_else(Name %in% Cont, 1, 0))
Then, I'd turn Days_after_birth into a continuous number -- right now its character vector in the form of a date:
mb <- mb %>%
mutate(date0 = parse_date(Days_after_birth, format = "%d.%m.%Y"),
birth_date = as.Date("1981-07-01"),
age_in_days = as.numeric(date0 - birth_date))
NOW is when I would do something like
glm(formula = is_active ~ Hour + Minute + is_cont + age_in_days,
data = mb,
family = binomial)
Does this make sense?