I'm trying to do some basic means from a dataset, but it is incomplete. I have a dataset that is two columns: one a list of numbers, and another categorical with binary options. However, although I have numerical data for all, I'm missing categorical data for 50 of them. I want to calculate all possible means and medians for each category to see the possible range and spread of simulations. This will help me understand the "true" mean.
I figure this means that there are 2^50 different possibilities. Is it possible to calculate all of these in R? Or is this too many? I might be able to reduce it down a bit, but not much.
Apologies if this is a basic question. I'm not massively familiar with R but am trying!
EDIT
I am trying to calculate the pay gap between men and women. The numerical data is pay, and the categorical data is gender. To calculate the gap, I need to do this calculation: (mean pay for men minus mean pay for women) / (mean pay for men). However I do not have gender data for 50 people. They could be all men, or all women, or any one of 2^50 different combinations of men and women. I want to calculate all gaps to see what is most likely.