Oaxaca Blinder Decomposition

bibimbap · March 14, 2020, 12:09pm

Hi, I want to use an oaxaca blinder decomposition in my thesis to examine the gap between the wage of male and female workers. However, I use labour survey data, so the data ist clustered, so every sample is weighted for representativeness of the finite set. I guess I need to include that in my regression, but oaxaca has no function for this. I read that I can do it with the survey package, but I could not manage to find out how to do it. Does anyone have experience with that? Any help would be appreciated! Thanks!

technocrat · March 14, 2020, 7:21pm

Hi, and welcome!

Please see the FAQ: What's a reproducible example (`reprex`) and how do I do one? Using a reprex, complete with representative data will attract quicker and more answers. In this case, however, where you're looking for general guidance, a reprex is not needed.

I suspect your question is very narrow for this discussion group, but someone (maybe even me!) might be able to help with more information.

How does your data structure differ from the chicago data frame in its layout?

library(oaxaca)
#> 
#> Please cite as:
#>  Hlavac, Marek (2018). oaxaca: Blinder-Oaxaca Decomposition in R.
#>  R package version 0.1.4. https://CRAN.R-project.org/package=oaxaca

# load data set of Hispanic workers in Chicago
data(chicago)

str(chicago)
#> 'data.frame':    712 obs. of  9 variables:
#>  $ age            : int  52 46 31 35 19 50 33 43 39 22 ...
#>  $ female         : int  0 1 1 0 0 1 0 0 1 0 ...
#>  $ foreign.born   : int  1 1 1 1 0 1 1 1 1 0 ...
#>  $ LTHS           : int  0 0 0 0 0 1 1 0 0 0 ...
#>  $ high.school    : int  1 1 1 1 1 0 0 1 1 0 ...
#>  $ some.college   : int  0 0 0 0 0 0 0 0 0 1 ...
#>  $ college        : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ advanced.degree: int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ ln.real.wage   : num  2.14 NA 2.5 2.71 2.08 ...

^{Created on 2020-03-14 by the reprex package (v0.3.0)}

Also, is your data similar to the apiclus1 dataset from data(api) in the survey package? If so, see the extended vignette

bibimbap · March 14, 2020, 11:26pm

Hi, thanks for your answer and your guide for the reprex! It actually looks like the chicago data frame. There is an additional variable that includes the weight of each observation since it is a stratified sample. There are two weighting factors, one is for "statistics with net wages and to calculate the amount of employees" and the other for "statistics of standardized gross wages and to calculate the full-time equivalents". So as far I understand, when I want to do a oaxaca decomposition for the standardized gross wages, I also need to include these weights? But I am not sure how to do that..Thanks a lot!

technocrat · March 15, 2020, 4:10am

OK, thanks. Got you, I think. The two weighting factors put each observation on the same footing to account for differences in deductions and time worked for each period. I'll go back to the documentation to see if that is a use case that it covers.

technocrat · March 15, 2020, 4:25am

The oaxaca package uses weights on between group comparisons. The weights for your case relate to developing a response variable on an apples-to-apples basis. That is, a response variable Y that is the result of putting all the observations on a common basis with respect to net pay per unit time worked

Once you've calculated Y and added it to your data frame, your model will look like lm(Y ~ X1 ... Xn) and oaxaca only takes care of the inter-group weighting.

To calculate Y, you'd want to take the observed pay and multiply it by both weights to get an hourly or full-time equivalent figure. (Easy to say for the guy who doesn't have to do it himself.)

system · April 5, 2020, 4:25am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.