Hi, I want to use an oaxaca blinder decomposition in my thesis to examine the gap between the wage of male and female workers. However, I use labour survey data, so the data ist clustered, so every sample is weighted for representativeness of the finite set. I guess I need to include that in my regression, but oaxaca has no function for this. I read that I can do it with the survey package, but I could not manage to find out how to do it. Does anyone have experience with that? Any help would be appreciated! Thanks!

Hi, and welcome!

Please see the FAQ: What's a reproducible example (`reprex`) and how do I do one? Using a reprex, complete with representative data will attract quicker and more answers. In this case, however, where you're looking for general guidance, a `reprex`

is not needed.

I suspect your question is very narrow for this discussion group, but someone (maybe even me!) might be able to help with more information.

How does your data structure differ from the `chicago`

data frame in its layout?

```
library(oaxaca)
#>
#> Please cite as:
#> Hlavac, Marek (2018). oaxaca: Blinder-Oaxaca Decomposition in R.
#> R package version 0.1.4. https://CRAN.R-project.org/package=oaxaca
# load data set of Hispanic workers in Chicago
data(chicago)
str(chicago)
#> 'data.frame': 712 obs. of 9 variables:
#> $ age : int 52 46 31 35 19 50 33 43 39 22 ...
#> $ female : int 0 1 1 0 0 1 0 0 1 0 ...
#> $ foreign.born : int 1 1 1 1 0 1 1 1 1 0 ...
#> $ LTHS : int 0 0 0 0 0 1 1 0 0 0 ...
#> $ high.school : int 1 1 1 1 1 0 0 1 1 0 ...
#> $ some.college : int 0 0 0 0 0 0 0 0 0 1 ...
#> $ college : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ advanced.degree: int 0 0 0 0 0 0 0 0 0 0 ...
#> $ ln.real.wage : num 2.14 NA 2.5 2.71 2.08 ...
```

^{Created on 2020-03-14 by the reprex package (v0.3.0)}

Also, is your data similar to the `apiclus1`

dataset from `data(api)`

in the `survey`

package? If so, see the `extended vignette`

Hi, thanks for your answer and your guide for the reprex! It actually looks like the chicago data frame. There is an additional variable that includes the weight of each observation since it is a stratified sample. There are two weighting factors, one is for "statistics with net wages and to calculate the amount of employees" and the other for "statistics of standardized gross wages and to calculate the full-time equivalents". So as far I understand, when I want to do a oaxaca decomposition for the standardized gross wages, I also need to include these weights? But I am not sure how to do that..Thanks a lot!

OK, thanks. Got you, I think. The two weighting factors put each observation on the same footing to account for differences in deductions and time worked for each period. I'll go back to the documentation to see if that is a use case that it covers.

The `oaxaca`

package uses `weights`

on between group comparisons. The weights for your case relate to developing a response variable on an apples-to-apples basis. That is, a response variable Y that is the result of putting all the observations on a common basis with respect to net pay per unit time worked

Once you've calculated Y and added it to your data frame, your model will look like `lm(Y ~ X1 ... Xn)`

and `oaxaca`

only takes care of the inter-group weighting.

To calculate Y, you'd want to take the observed pay and multiply it by both weights to get an hourly or full-time equivalent figure. (Easy to say for the guy who doesn't have to do it himself.)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.