How do I calculate ratios with categorical variables in a summary?


Hello Alison,
Thank you!! I tried it out too but I am not getting the right figures.



# generate sample data
sleep_cleaned1 <- tibble (
  marital = sample(c("Married", "Divorced", "Widowed", "Separated", "Never married", "A member of an unmarried couple"), 50, T), 
  genhlth = sample(c("Excellent", "Very Good", "Good", "Fair", "Poor"), 50, T),
  id      = sample(1:1000, 50, T))

sleep_cleaned1 %>% 
  crosstab(marital, genhlth) %>% 


@christinelly, your example has additional categories for marital and genhlth, so you will get a different result because these categories are sampled randomly for the purposes of generating some example data. If you use the same example as apreshill did you should get the same result (just make sure you re-run the set.seed(1234) command, too).


So if the sampling was just used to create an example and has nothing to do with my code, then what is the code if I remove the sampling?

My code below does not work below.
May I please ask what does the set.seed(1234) does?



sleep_cleaned1 <- tibble 
  ((marital = ("Married", "Divorced", "Widowed", "Separated", "Never married", "A member of an unmarried couple"),
  (genhlth = "Excellent", "Very Good", "Good", "Fair", "Poor"))

sleep_cleaned1 %>% 
  crosstab(marital, genhlth) %>% 


Hi @christinelly. To quote from R Function of the Day:

Set the seed of R‘s random number generator, which is useful for creating simulations or random objects that can be reproduced.

  • seed – A number.

Maëlle Salmon did a fun write-up on the use of set.seed among R users on GitHub, which also gives a nice explanation


Defining set.seed just allows random samples to be recreated when you re-run the code. If you left it out you would get a different sample data set each time you ran the code. You can use any other integer to get a different sample.

As only you have the original data, LVG77 generated an example data set earlier in the thread to illustrate working code. In your earlier example you change the categories, so got a different result.


Thank you for your help, it is now clear to me why set.seed is used for.

My question is not answered. My code is not running with the suggested code. Or is the code supposed to be used just for sample statistics and not for original data?


Yes, just for sample data in this case.


Perfect the code makes sense now. Thanks