# Summary of categorical data

Hi, when I used the summary() function on a data(containing both numerical and categorical variable), the summary of the categorical data shows the length, class and mode of the variables. I was expecting to see the summary in terms of the levels. Do I need to install a package to see that or there is some other problem?

I'd need to know what you expect to get from such function. I'm going to assume you need to compute a level-wise summary of the data using a list of functions (here I use the mean and SD as an example).

You can try with the combination of `group_by()` and `summarise()` from the package dplyr. I don't have the code or data you are using but it would look like this:

``````data %>%
group_by(categorical_variable) %>% # group observations from the same level of
categorical_variable
summarise(mn1 = mean(numeric_variable1), # compute the mean of numeric_variable1 for each level of categorical_variable
std1 = sd(numeric_variable1),
mn2 = mean(numeric_variable2),
std2 = sd(numeric_variabl2))
``````

You can also group observations from combinations of levens from two categorical variables using `group_by(categorical_variable1,categorical_variable2)`.

Hope this addresses you problem! NOTE: Edited to indet the code properly.

If you run `summary(iris)`, you'll see:

``````> summary(iris)
Sepal.Length    Sepal.Width     Petal.Length    Petal.Width          Species
Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100   setosa    :50
1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300   versicolor:50
Median :5.800   Median :3.000   Median :4.350   Median :1.300   virginica :50
Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199
3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800
Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500
``````

The column `Species` is a factor (not just a vector of characters) so the summary breaks it down by level.

I suspect that the data frame you're considering doesn't have factors. Is that part of some tutorial you're following? Previous versions of `R` (< 4.0.0) used to automatically turn strings into factors. This is no longer the case. So if you're following old-ish code, the behaviour will be a bit different; you'll see vectors of characters where you probably expect factors.

Either use `StringAsFactors =TRUE` when building the data frames, or set the global option to TRUE with `options(stringsAsFactors = TRUE)` to emulate what would happen in `R <4.0.0`.

Thank you for the suggestion. I am using R >4.0.0 but was expecting the strings to automatically turn into factors. Using the global option to TRUE, it works now!

Thank you for the reply. I was expecting the strings to automatically turn into factor and show the summary accordingly. Anyway I have followed the suggestion given by ChrisL and it works now.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.