Comparing means of categorical data

I am trying to compare means of categorical data, that is education (integer) grouped by sex (Male and Female). So I have used the following code to view their means:

gss %>%
group_by(sex) %>%
  summarise(xbar = mean(educ))

This code provides a table:

Sex xbar
Male NA
Female NA

I dont get why this is the case since my education (educ) vector has numerical values and NA (very few).

Hi @SL_Mabaso

The result would be the same even if there was a single NA value in each category. If you want to ignore those values, then you need to add the na.rm = TRUEargument to the mean() function:

gss %>%
group_by(sex) %>%
  summarise(xbar = mean(educ, na.rm = TRUE))
1 Like

If the Education variable is categorical but in form of numerical data, it's meaningless in statistics to calculate the mean of categorical data.

Use the str function to determine whether educ is a factor or a number:

str(gss$educ)

If it is looks like a number but nevertheless is a factor, then convert it to numeric in order to calculate its mean:

gss$educ <- as.numeric(gss$educ)

Thank you this way was much easier for me to use.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.