How do I calculate ratios with categorical variables in a summary?

christinelly · October 26, 2017, 7:58am

Hello Alison,
Thank you!! I tried it out too but I am not getting the right figures.

install.packages("janitor")
library(janitor)

set.seed(1234)

# generate sample data
sleep_cleaned1 <- tibble (
  marital = sample(c("Married", "Divorced", "Widowed", "Separated", "Never married", "A member of an unmarried couple"), 50, T), 
  genhlth = sample(c("Excellent", "Very Good", "Good", "Fair", "Poor"), 50, T),
  id      = sample(1:1000, 50, T))

sleep_cleaned1 %>% 
  crosstab(marital, genhlth) %>% 
  adorn_crosstab("row")

martin.R · October 26, 2017, 11:47am

@christinelly, your example has additional categories for marital and genhlth, so you will get a different result because these categories are sampled randomly for the purposes of generating some example data. If you use the same example as apreshill did you should get the same result (just make sure you re-run the set.seed(1234) command, too).

christinelly · October 26, 2017, 12:44pm

So if the sampling was just used to create an example and has nothing to do with my code, then what is the code if I remove the sampling?

My code below does not work below.
May I please ask what does the set.seed(1234) does?

install.packages("janitor")
library(janitor)

set.seed(1234)

sleep_cleaned1 <- tibble 
  ((marital = ("Married", "Divorced", "Widowed", "Separated", "Never married", "A member of an unmarried couple"),
  (genhlth = "Excellent", "Very Good", "Good", "Fair", "Poor"))

sleep_cleaned1 %>% 
  crosstab(marital, genhlth) %>% 
  adorn_crosstab("row")

mara · October 26, 2017, 12:55pm

Hi @christinelly. To quote from R Function of the Day:

set.seed(seed)
Set the seed of R‘s random number generator, which is useful for creating simulations or random objects that can be reproduced.

seed – A number.

Maëlle Salmon did a fun write-up on the use of set.seed among R users on GitHub, which also gives a nice explanation
http://www.masalmon.eu/2017/04/12/seeds/

martin.R · October 26, 2017, 12:56pm

Defining set.seed just allows random samples to be recreated when you re-run the code. If you left it out you would get a different sample data set each time you ran the code. You can use any other integer to get a different sample.

As only you have the original data, LVG77 generated an example data set earlier in the thread to illustrate working code. In your earlier example you change the categories, so got a different result.

christinelly · October 26, 2017, 1:11pm

Thank you for your help, it is now clear to me why set.seed is used for.

My question is not answered. My code is not running with the suggested code. Or is the code supposed to be used just for sample statistics and not for original data?

martin.R · October 26, 2017, 1:14pm

Yes, just for sample data in this case.

christinelly · October 26, 2017, 1:45pm

Perfect the code makes sense now. Thanks