Is there a weighted.median function

My data looks like this:

  population_weight ttl_income_national_pretax
              <dbl>                      <dbl>
1             1678.                      1543.
2             7530.                      2257.
3             7642.                      3719.
4             4868.                     14054.
5             7559.                     24771.

If I take the median of the 2nd column it won't be correct because there are the number of people in population_weight with that income.

If I wanted the weighted average I could use

weighted.mean(ttl_income_national_pretax, w = population_weight)

Is there something similar for a weighted median?

There are two potential ways (that I know of) to do this. The first is to treat ttl_income_national_pretax as if it were a vector of values in which each unique value is repeated population_weight times. Then you just find the median of that vector. The second is the weighted.median function in the spatstat package, which defines the weighted median as: "a value m such that the total weight of data to the left of m is equal to half the total weight. If there is no such value, linear interpolation is performed."

df = read.table(text="row  population_weight ttl_income_national_pretax

1             1678.                      1543.
2             7530.                      2257.
3             7642.                      3719.
4             4868.                     14054.
5             7559.                     24771.", header=TRUE)

with(df, median(rep(ttl_income_national_pretax, population_weight)))
#> [1] 3719

library(spatstat)

weighted.median(df$ttl_income_national_pretax, df$population_weight)
#> [1] 3295.915
# Hand calculation of spatstat weighted.median for your data
x = df$ttl_income_national_pretax
w = df$population_weight
Fx = cumsum(w)/sum(w)

# Note that the cumulative weight crosses the 50% mark between the 
#  2nd and 3rd elements
Fx
#> [1] 0.05731462 0.31451310 0.57553711 0.74181098 1.00000000

x[2] + (0.5 - Fx[2])/(Fx[3] - Fx[2]) * (x[3] - x[2])
#> [1] 3295.915

Created on 2020-02-18 by the reprex package (v0.3.0)

You can see exactly what weighted.median is doing by typing weighted.quantile in the console, which will display the function code (weighted.median actually calls weighted.quantile to do the calculation).

The difference between the two methods will be no more than the difference between the two data values that straddle a cumulative weight of 50% (the data must be ordered from lowest to highest value before determining the cumulative weight). For example, in your data, if you change 2257 to, say, 3709, you'll see that the weighted median is now between 3709 and 3719, instead of between 2257 and 3719.

2 Likes

I never heard of the spatstat package but it does exactly what I need. The first solution was very elegant also. Better than what I was going to do.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.