There are two potential ways (that I know of) to do this. The first is to treat `ttl_income_national_pretax`

as if it were a vector of values in which each unique value is repeated `population_weight`

times. Then you just find the median of that vector. The second is the `weighted.median`

function in the `spatstat`

package, which defines the weighted median as: "a value `m`

such that the total weight of data to the left of `m`

is equal to half the total weight. If there is no such value, linear interpolation is performed."

```
df = read.table(text="row population_weight ttl_income_national_pretax
1 1678. 1543.
2 7530. 2257.
3 7642. 3719.
4 4868. 14054.
5 7559. 24771.", header=TRUE)
with(df, median(rep(ttl_income_national_pretax, population_weight)))
#> [1] 3719
library(spatstat)
weighted.median(df$ttl_income_national_pretax, df$population_weight)
#> [1] 3295.915
```

```
# Hand calculation of spatstat weighted.median for your data
x = df$ttl_income_national_pretax
w = df$population_weight
Fx = cumsum(w)/sum(w)
# Note that the cumulative weight crosses the 50% mark between the
# 2nd and 3rd elements
Fx
#> [1] 0.05731462 0.31451310 0.57553711 0.74181098 1.00000000
x[2] + (0.5 - Fx[2])/(Fx[3] - Fx[2]) * (x[3] - x[2])
#> [1] 3295.915
```

^{Created on 2020-02-18 by the reprex package (v0.3.0)}

You can see exactly what `weighted.median`

is doing by typing `weighted.quantile`

in the console, which will display the function code (`weighted.median`

actually calls `weighted.quantile`

to do the calculation).

The difference between the two methods will be no more than the difference between the two data values that straddle a cumulative weight of 50% (the data must be ordered from lowest to highest value before determining the cumulative weight). For example, in your data, if you change `2257`

to, say, `3709`

, you'll see that the weighted median is now between 3709 and 3719, instead of between 2257 and 3719.