Calculate prevelance rates using R

hayho · December 24, 2022, 4:18pm

Hello everyone,
I am VERY new to R and can only do the most basic things on there. I am currently working on a project. Unfortunately I cannot contact my supervisor at the moment, so I just wanted to see what I could do on my own until I can meet him again.

About the data:
My goal is to calculate the prevelance rates of overweight and obesity in a cohort of school kids in Sweden by year (2000-2005). I have the bmi z-scores. I am now looking for a simple way to calculate the prevalence rates for every year.

My approach was to somehow transform the bmi z-scores into a binary variable (bmi z-scores for obesity being 1, and for overweight 0) and then create prop.tables? Would that work? Is there perhaps a better and more elegant way to calculate prevalence rates?

Thank you so much in advance, I hope you understand what my goal is. English is not my first language, so this wasn't easy for me. Anyway. Thank you and have a wonderful Christmas, if you celebrate it.

jrkrideau · December 24, 2022, 7:32pm

Hi, welcome to the forum. Astute advisors know when to disappear.!

The answer to your question probably depends on your data layout. Could you supply some sample data? A handy way to supply some sample data is the dput() function. In the case of a large dataset something like dput(head(mydata, 100)) should supply the data we need. Just do dput(mydata) where wydata is the name of your data.frame or tibble, etc., and paste the output here.

You may also find FAQ: How to do a minimal reproducible example ( reprex ) for beginners or FAQ: Tips for writing R-related questions
helpful.

hayho · December 25, 2022, 9:24am

Thank you so much for the response! I did what you asked me to, here is the data:

structure(list(subject_id = c("d5",

"d51", "4d0",

"4tg", "c8c"

), sex = c(1L, 1L, 0L, 0L, 0L

year = c(2002, 2005, 2002, 2002, 2005)

height = c(158, 158, 169,

140, 138), weight = c(82.5, 88, 71.8000030517578,

49.7999992370605, 29.7000007629395), age = c(14.7808219178082,

15.8575342465753, 14.4054794520548, 13.9260273972603, 9.42076502732241

), bmi = c(32.7154540667094, 35.2507610959782, 25.5609153735218,

19.8481034599776, 15.5954635386156), bmi_sds = c(2.68651915075224,

3.01580016406978, 1.59865327440851, 0.230053325961097, -0.533827159019426

)), row.names = c(NA, 5L), class = "data.frame").

I deleted some irrelevant information and only asked about 5 data entries because R gave me this huge paragraph of text. I hope you can help me with only this information above provided.

jrkrideau · December 25, 2022, 1:37pm

Great, thanks. This looks like a good start but I suspect that a bit more data would be helpful. How many people are in your data set? If it is a large number, then something like dput(head(100)) would give us a bit more to work with. It is not all that unusual to see someone post a 1,000 lines of data or provide a link to a data source.

Are all the cases in yyour data set considered to be obese or overweight? If not, dichotomizing the data is throwing away information. For that matter ,dichotomizing is probably not a great idea in any case.

If you are just getting started with this data set, you might want to have a look at some of the suggestions in Chapter 7 of R for Data Science

What is the bmi-sds variable?

system · January 15, 2023, 1:38pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.