Hi, when I used the summary() function on a data(containing both numerical and categorical variable), the summary of the categorical data shows the length, class and mode of the variables. I was expecting to see the summary in terms of the levels. Do I need to install a package to see that or there is some other problem?
I'd need to know what you expect to get from such function. I'm going to assume you need to compute a level-wise summary of the data using a list of functions (here I use the mean and SD as an example).
You can try with the combination of group_by()
and summarise()
from the package dplyr. I don't have the code or data you are using but it would look like this:
data %>%
group_by(categorical_variable) %>% # group observations from the same level of
categorical_variable
summarise(mn1 = mean(numeric_variable1), # compute the mean of numeric_variable1 for each level of categorical_variable
std1 = sd(numeric_variable1),
mn2 = mean(numeric_variable2),
std2 = sd(numeric_variabl2))
You can also group observations from combinations of levens from two categorical variables using group_by(categorical_variable1,categorical_variable2)
.
Hope this addresses you problem!
NOTE: Edited to indet the code properly.
If you run summary(iris)
, you'll see:
> summary(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100 setosa :50
1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300 versicolor:50
Median :5.800 Median :3.000 Median :4.350 Median :1.300 virginica :50
Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
The column Species
is a factor (not just a vector of characters) so the summary breaks it down by level.
I suspect that the data frame you're considering doesn't have factors. Is that part of some tutorial you're following? Previous versions of R
(< 4.0.0) used to automatically turn strings into factors. This is no longer the case. So if you're following old-ish code, the behaviour will be a bit different; you'll see vectors of characters where you probably expect factors.
Either use StringAsFactors =TRUE
when building the data frames, or set the global option to TRUE with options(stringsAsFactors = TRUE)
to emulate what would happen in R <4.0.0
.
Thank you for the suggestion. I am using R >4.0.0 but was expecting the strings to automatically turn into factors. Using the global option to TRUE, it works now!
Thank you for the reply. I was expecting the strings to automatically turn into factor and show the summary accordingly. Anyway I have followed the suggestion given by ChrisL and it works now.
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.