# Is there a weighted.median function

My data looks like this:

``````  population_weight ttl_income_national_pretax
<dbl>                      <dbl>
1             1678.                      1543.
2             7530.                      2257.
3             7642.                      3719.
4             4868.                     14054.
5             7559.                     24771.

``````

If I take the median of the 2nd column it won't be correct because there are the number of people in `population_weight` with that income.

If I wanted the weighted average I could use

``````weighted.mean(ttl_income_national_pretax, w = population_weight)
``````

Is there something similar for a weighted median?

There are two potential ways (that I know of) to do this. The first is to treat `ttl_income_national_pretax` as if it were a vector of values in which each unique value is repeated `population_weight` times. Then you just find the median of that vector. The second is the `weighted.median` function in the `spatstat` package, which defines the weighted median as: "a value `m` such that the total weight of data to the left of `m` is equal to half the total weight. If there is no such value, linear interpolation is performed."

``````df = read.table(text="row  population_weight ttl_income_national_pretax

1             1678.                      1543.
2             7530.                      2257.
3             7642.                      3719.
4             4868.                     14054.
5             7559.                     24771.", header=TRUE)

with(df, median(rep(ttl_income_national_pretax, population_weight)))
#>  3719

library(spatstat)

weighted.median(df\$ttl_income_national_pretax, df\$population_weight)
#>  3295.915
``````
``````# Hand calculation of spatstat weighted.median for your data
x = df\$ttl_income_national_pretax
w = df\$population_weight
Fx = cumsum(w)/sum(w)

# Note that the cumulative weight crosses the 50% mark between the
#  2nd and 3rd elements
Fx
#>  0.05731462 0.31451310 0.57553711 0.74181098 1.00000000

x + (0.5 - Fx)/(Fx - Fx) * (x - x)
#>  3295.915
``````

Created on 2020-02-18 by the reprex package (v0.3.0)

You can see exactly what `weighted.median` is doing by typing `weighted.quantile` in the console, which will display the function code (`weighted.median` actually calls `weighted.quantile` to do the calculation).

The difference between the two methods will be no more than the difference between the two data values that straddle a cumulative weight of 50% (the data must be ordered from lowest to highest value before determining the cumulative weight). For example, in your data, if you change `2257` to, say, `3709`, you'll see that the weighted median is now between 3709 and 3719, instead of between 2257 and 3719.

2 Likes

I never heard of the `spatstat` package but it does exactly what I need. The first solution was very elegant also. Better than what I was going to do.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.