# Family Choice for GLM with 0-1 Dependent Variable

want to analyze the relationship between the dependent variable "Knowledge_index" (va numeric variable ranging between 0 and 1, both inclusive) and several independent variables (type “factor”, including "Age", "Gender", "Place_birth", "Residence_time", "Education" and "Professional_sector".

str(data\$Knowledge_index) num [1:430] 0.88 0.63 0.75 0.25 1 1 0.88 0.75 0.88 0 ...

The model: glm1 <- glm(Knowledge_index ~ Age+Gender+Place_birth+Re

sidence_time+Education+Professional_sector, data = data, family = quasibinomial(link = "logit"))

Since the dependent variable has values between 0 and 1, tengo entendido que es correcto usar a quasibinomial distribution with a logit link function (family = quasibinomial(link = "logit")). Is this correct or should I use another family?

Attached is also histogram of the dependent variable.

Below I add more information about the construction of the index (the dependent variable). This index is constructed from 4 questions asked to the respondent (I1, I2, I3, I4). The calculation is summarized below Knowledge_Index = (0.25 * I1) + (0.25 * I2) + (0.25 * I3) + (0.25 * I4). For example, if the respondent answers all 4 questions correctly, the result would be Knowledge_Index = (0.25 * 1) + (0.25 * 1) + (0.25 * 1) + (0.25 * 1) = 1. If the respondent answers three questions correctly and one incorrectly: Knowledge_index = (0.25 * 1) + (0.25 * 1) + (0.25 * 1) + (0.25 * 0) = 0.75

Thank you very much in advance!!

If `Knowledge_Index` represents a proportion (of say something like total knowledge), the quasibinomial family with the logit link is appropriate.

