How can I alter my regression to look at a conditional outcome?

How do I run a regression when looking at a specific subset of my sample?

For example, I want to study the trends of diabetes in my sample given that they have chronic kidney disease (CKD). Because the majority of my sample that has CKD also has diabetes (compared to any other other traditional risk factors for CKD), I want to study this subset of people and see what are potential risk factors for them. I want to examine what the trends are associated with (diabetes | CKD).

I want to assess how variables like wealth, education level, residence, sex, etc impact the prevalence of the diabetes prevalence (given that they have CKD).

My current code looks like this:

DiabetesGLM <- glm(formula = dta$diabetes ~ dta$Region + dta$sex + dta.wealth +dta$education,
family = poisson(link = log), data = dta, na.action = na.omit)

How can I amend this such that my regression is only assessing these variables for those who have diabetes and CKD? Am I going to have to make another data frame?

The glm function has a subset argument. If you have a column named CKD that contains "yes" when subjects have CKD:

DiabetesGLM <- glm(formula = diabetes ~ Region + sex + wealth +education,
family = poisson(link = log), data = dta, na.action = na.omit, subset = dta$CKD == "yes")

Notice I dropped all the dta$ from the formula; they are not needed.

Making a new data frame is also a good solution.

1 Like

I second @FJCC —using a new data frame will help keep the analyst from getting lost.

Assuming that dat has a TRUE/FALSE, 1/0, YES/NO variable in the fifth position

diab_only <- dat[,-5]

creates a new data frame with all rows of dat (the [, part) omitting (the - part) column 5, the `5] part.

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.