Running median for time series data

Hello,

I have a dataframe with values for each age between 0 and 110 for each year between 1867 and now. As I don't know how to describe better my data I joined 2 printscreen of it.

Capture d’écran 2022-06-25 à 14.30.28

Capture d’écran 2022-06-25 à 14.31.11

For each year, I need to take the age corresponding to the median of the column "lx". lx is a column representing the number of survivors in a given population for each year starting from 100000. So for each year I need to take the age for which we had 50.000 survivors.

I have been trying this code but it doesn't work, it gives me only one value :

Median_Ages <-female.period.lifetable_1x1$Age[female.period.lifetable_1x1$lx == median(female.period.lifetable_1x1$lx) ]

Could someone help me to select the values I need?

Thanks in advance

I do not think you want to use the median() function if what you want is the age which has an lx value of 50000. Of course, it is unlikely that any age will have a value of exactly 50000. In the code below, I select for the year that has the lx value that is closest to 50000. Other possible choices would be the last age that has lx > 50000 or the first age that has lx < 50000.

library(dplyr)

DF <- data.frame(Year = rep(c(1900,1901), each = 5),
                 Age = c(1,2,3,4,5,1,2,3,4,5),
                 lx = c(100000, 72000, 52000, 41000, 23000,
                        100000, 51000, 48000, 30000, 17000))
DF
#>    Year Age     lx
#> 1  1900   1 100000
#> 2  1900   2  72000
#> 3  1900   3  52000
#> 4  1900   4  41000
#> 5  1900   5  23000
#> 6  1901   1 100000
#> 7  1901   2  51000
#> 8  1901   3  48000
#> 9  1901   4  30000
#> 10 1901   5  17000
DF <- DF |> mutate(AbsDiff = abs(lx - 50000))
DF
#>    Year Age     lx AbsDiff
#> 1  1900   1 100000   50000
#> 2  1900   2  72000   22000
#> 3  1900   3  52000    2000
#> 4  1900   4  41000    9000
#> 5  1900   5  23000   27000
#> 6  1901   1 100000   50000
#> 7  1901   2  51000    1000
#> 8  1901   3  48000    2000
#> 9  1901   4  30000   20000
#> 10 1901   5  17000   33000
Closest <- DF |> group_by(Year) |> summarise(MinDiff = min(AbsDiff))
ChosenAges <- semi_join(DF, Closest, by = c("Year", "AbsDiff" = "MinDiff"))
ChosenAges
#>   Year Age    lx AbsDiff
#> 1 1900   3 52000    2000
#> 2 1901   2 51000    1000

Created on 2022-06-25 by the reprex package (v2.0.1)

1 Like

It works, thank you so much!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.