How do I calculate ratios with categorical variables in a summary?

Hello,

I am new to R, please help me out with this embarrassing question.

How do I get the %, ratio of the data in the graph shown in a summary/table ? I need the marital status and the weight in % for each category. I tried a code but failed miserably, please advice.

Thanks in advance
Christine

I am looking for a table something similar to this.
Marital Excellent Very good
Married % %
Divorced % %
Widowed

My non working code

sleep_cleaned %>%
  group_by(marital, genhlth) %>%
  tally() %>%
  mutate(x = n / sum(n)) %>%
  summary(x= factor(marital), y= factor(genhtlh))

47

Try this:

sleep_cleaned %>%
  group_by(marital, genhlth) %>%
  summarise(x = n / sum(n))

thank you but I get an error, Error in summarise_impl(.data, dots) : Evaluation error: invalid 'type' (closure) of argument.

Sorry, I didn't look closely enough:

sleep_cleaned %>%
 count(marital, genhlth) %>%
 mutate(prop = prop.table(n))

Hello Martin,
Thank you very much! It does not seem to partition over the right data as it does not add to 100%.
I am looking for nbr of married in excellent genhlth/ total married.

Also, It does not make the data very easy to read. Is there not a way to not repeat the variables?
01

Something like this?

Marital Excellent Very good
Married % %
Divorced % %
Widowed

Ok, I was guessing unsuccessfully at what your data looks like.

Could you please post a reproducible example and your expected output.

Check this link for a reprex.

Here is a quick reprex and a possible answer:

suppressPackageStartupMessages(library(tidyverse))
set.seed(1234)

# generate sample data
sleep_cleaned <- tibble(
  marital = sample(c("Married", "Divorced", "Widowed"), 50, T),
  genhlth = sample(c("Excelent", "Very Good", "Fair"), 50, T),
  id      = sample(1:1000, 50, T))


# generate frequency table
sleep_cleaned %>% 
  count(marital, genhlth) %>% 
  group_by(marital) %>% 
  mutate(prop = n / sum(n)) %>% 
  select(-n) %>% 
  spread(key = genhlth, value = prop)
#> # A tibble: 3 x 4
#> # Groups: marital [3]
#>   marital  Excelent  Fair `Very Good`
#> * <chr>       <dbl> <dbl>       <dbl>
#> 1 Divorced    0.400 0.267      0.333 
#> 2 Married     0.609 0.174      0.217 
#> 3 Widowed     0.500 0.417      0.0833
2 Likes

Hello,
Thank you but now I ran your first code
sleep_cleaned <- tibble(
marital = sample(c("Married", "Divorced", "Widowed"), 50, T),
genhlth = sample(c("Excelent", "Very Good", "Fair"), 50, T),
id = sample(1:1000, 50, T))

and I want to return my dataset to what it was, how do I do that?

Oh I am sorry. I used this first portion only to build sample data for the example. You can completely ignore it as you already have the real data. Cheers.

I ran the code and I want my data back to what it was, can you please tell me how I restore my dataframe?

Ok I just ran my code again, is there no way in R to undo?

You will have to re-run your script from the beginning.

ok got it. I tried your code but it gives me an error, Error in spread(., key = genhlth, value = prop) : could not find function "spread"

You have to load the necessary library first. Put this line in the beginning of your script and re-run it:
library(tidyverse)

Hello,

I am encountering some difficulties with rendex so I can't produce an example.
Error: Install these packages in order to use the reprex addin:
shinyjs
In addition: Warning message:
In flat_str(content, breaks) : Coercing content to character

Don't worry about that. It's no longer required.

I was merely asking for a reproducible example because I didn't understand what you wanted, but LVG77 did and provided the example data to produce the solution.

1 Like

aahh perfect!!! That was exactly what I needed. Was not as easy as I thought. So it was not possible without downloading extra packages?
Thank you so much.

06

1 Like

The tidyverse is a convenient package of packages.

You were already using dplyr and ggplot2. tidyr was required for the spread function, so you could either load the packages individually or just tidyverse on its own in which case you don't need to load dplyr or ggplot2 separately.

This might help to explain the different packages for you:
https://www.tidyverse.org/packages/

2 Likes

Great, thank you for the clarifications. It is very helpful!

1 Like

Wanted to plug one of my favorite tidyverse packages here- janitor:

install.packages("janitor")
library(janitor)

set.seed(1234)

# generate sample data
sleep_cleaned <- tibble(
  marital = sample(c("Married", "Divorced", "Widowed"), 50, T),
  genhlth = sample(c("Excelent", "Very Good", "Fair"), 50, T),
  id      = sample(1:1000, 50, T))

sleep_cleaned %>% 
  crosstab(marital, genhlth) %>% 
  adorn_crosstab("row") 

Delivers:

marital Excellent Fair Very Good
Divorced 40.0% (6) 26.7% (4) 33.3% (5)
Married 60.9% (14) 17.4% (4) 21.7% (5)
Widowed 50.0% (6) 41.7% (5) 8.3% (1)
4 Likes