Hi all! I'm trying to convert from table form with frequencies to case form with all individual data points. I was using expand.dft but it seems to just get rid of the frequency column altogether, rather than expanding it. Any ideas?

Can you do a summary() and head() on Frequency just after you load it from the Excel sheet?

What would a frequency of 13.2 mean in this context?

What would it benefit you in your workflow to uncount rather than to work with weighted data?

Frequency is the number of people (in thousands). So I was hoping to create a dataset that had every person listed individually. I tried making them whole numbers, but unfortunately that didn't work either.

I think you need to realise that there's no benefit to doing that.

If I tell you I have a dataset with an entry with frequency of two, and average height of 6ft.

Would you unpack that to a two row table with anything other than 6 in the height fields.

There's no way for you to know/restore that for one person their height was 6ft 1 inch and the other was 5ft 11inch. That information was lost when it was summarised

These are not averages though - my aim is to plot the % chance someone has lost their job against the the % chance they have tertiary education. So if I could expand out each individual, I could try to see whether there is a correlation between the two variables. But at the moment, each industry is weighted equally despite the number of people in that industry, which means the correlation would be distorted.

I believe there are several R packages that offer weighted correlation.

Otherwise multiple the frequency so that the result is non decimal I.e. some number of full rows. And dplyr uncount can be used

Thank you, I'll try that!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.