How can I complete a data.table column with values from this column? (or how can I devide two rows not in sequence)

I have a data.table() with the columns : city,month, year and gdp_per_capta. (actually I have 30 more columns)

My base is full fill from 2005 to 2018.
But, to the year 2019, the gdp_per_capta has NA.

I would like to put a value on it, this way: 2019 = (2018/2017) * 2018

How can I do it?

PS: How can I improve the title of this question?

Example of my data:

structure(list(city = c(110001L, 110001L, 110001L, 110001L, 
                                      110001L, 110001L, 110001L, 110001L, 110001L, 110001L, 110001L, 
                                      110001L, 110037L, 110037L, 110037L, 110037L, 110037L, 110037L, 
                                      110037L, 110037L, 110037L, 110037L, 110037L, 110037L, 110040L, 
                                      110040L, 110040L, 110040L, 110040L, 110040L, 110040L, 110040L, 
                                      110040L, 110040L, 110040L, 110040L), month = c("01", "04", "07", 
                                                                                   "10", "02", "05", "08", "11", "03", "06", "09", "12", "01", "04", 
                                                                                   "07", "10", "02", "05", "08", "11", "03", "06", "09", "12", "01", 
                                                                                   "04", "07", "10", "02", "05", "08", "11", "03", "06", "09", "12"
                                      ), year = c(2017L, 2017L, 2017L, 2017L, 2018L, 2018L, 2018L, 2018L, 
                                                 2019L, 2019L, 2019L, 2019L, 2017L, 2017L, 2017L, 2017L, 2018L, 
                                                 2018L, 2018L, 2018L, 2019L, 2019L, 2019L, 2019L, 2017L, 2017L, 
                                                 2017L, 2017L, 2018L, 2018L, 2018L, 2018L, 2019L, 2019L, 2019L, 
                                                 2019L), gdp_per_capta = c(19081.43, 19081.43, 19081.43, 
                                                                                  19081.43, 21552.47, 21552.47, 21552.47, 21552.47, NA, NA, NA, 
                                                                                  NA, 23353.6, 23353.6, 23353.6, 23353.6, 21053.93, 21053.93, 21053.93, 
                                                                                  21053.93, NA, NA, NA, NA, 14699.7, 14699.7, 14699.7, 14699.7, 
                                                                                  15655.57, 15655.57, 15655.57, 15655.57, NA, NA, NA, NA)), row.names = c(NA, 
                                                                                                                                                          -36L), class = c("data.table", "data.frame"))

Expected result:

resultadoesperado

Imputation of missing values using that algorithm can be done with

suppressPackageStartupMessages({
  library(dplyr)
})
DF <- tibble(city = c(
  110001L, 110001L, 110001L, 110001L,
  110001L, 110001L, 110001L, 110001L, 110001L, 110001L, 110001L,
  110001L, 110037L, 110037L, 110037L, 110037L, 110037L, 110037L,
  110037L, 110037L, 110037L, 110037L, 110037L, 110037L, 110040L,
  110040L, 110040L, 110040L, 110040L, 110040L, 110040L, 110040L,
  110040L, 110040L, 110040L, 110040L
), month = c(
  "01", "04", "07",
  "10", "02", "05", "08", "11", "03", "06", "09", "12", "01", "04",
  "07", "10", "02", "05", "08", "11", "03", "06", "09", "12", "01",
  "04", "07", "10", "02", "05", "08", "11", "03", "06", "09", "12"
), year = c(
  2017L, 2017L, 2017L, 2017L, 2018L, 2018L, 2018L, 2018L,
  2019L, 2019L, 2019L, 2019L, 2017L, 2017L, 2017L, 2017L, 2018L,
  2018L, 2018L, 2018L, 2019L, 2019L, 2019L, 2019L, 2017L, 2017L,
  2017L, 2017L, 2018L, 2018L, 2018L, 2018L, 2019L, 2019L, 2019L,
  2019L
), gdp_per_capta = c(
  19081.43, 19081.43, 19081.43,
  19081.43, 21552.47, 21552.47, 21552.47, 21552.47, NA, NA, NA,
  NA, 23353.6, 23353.6, 23353.6, 23353.6, 21053.93, 21053.93, 21053.93,
  21053.93, NA, NA, NA, NA, 14699.7, 14699.7, 14699.7, 14699.7,
  15655.57, 15655.57, 15655.57, 15655.57, NA, NA, NA, NA
))

impute <- function(x) lag(x,4)/lag(x,8)*lag(x,4)
DF %>% mutate(gdp_per_capta = ifelse(is.na(gdp_per_capta),impute(gdp_per_capta),gdp_per_capta)) %>% print(n = Inf)
#> # A tibble: 36 x 4
#>      city month  year gdp_per_capta
#>     <int> <chr> <int>         <dbl>
#>  1 110001 01     2017        19081.
#>  2 110001 04     2017        19081.
#>  3 110001 07     2017        19081.
#>  4 110001 10     2017        19081.
#>  5 110001 02     2018        21552.
#>  6 110001 05     2018        21552.
#>  7 110001 08     2018        21552.
#>  8 110001 11     2018        21552.
#>  9 110001 03     2019        24344.
#> 10 110001 06     2019        24344.
#> 11 110001 09     2019        24344.
#> 12 110001 12     2019        24344.
#> 13 110037 01     2017        23354.
#> 14 110037 04     2017        23354.
#> 15 110037 07     2017        23354.
#> 16 110037 10     2017        23354.
#> 17 110037 02     2018        21054.
#> 18 110037 05     2018        21054.
#> 19 110037 08     2018        21054.
#> 20 110037 11     2018        21054.
#> 21 110037 03     2019        18981.
#> 22 110037 06     2019        18981.
#> 23 110037 09     2019        18981.
#> 24 110037 12     2019        18981.
#> 25 110040 01     2017        14700.
#> 26 110040 04     2017        14700.
#> 27 110040 07     2017        14700.
#> 28 110040 10     2017        14700.
#> 29 110040 02     2018        15656.
#> 30 110040 05     2018        15656.
#> 31 110040 08     2018        15656.
#> 32 110040 11     2018        15656.
#> 33 110040 03     2019        16674.
#> 34 110040 06     2019        16674.
#> 35 110040 09     2019        16674.
#> 36 110040 12     2019        16674.
1 Like

If you want to stay within data.table, then this would produce the same output:

df[, gdp_per_capta := ifelse(is.na(gdp_per_capta), shift(gdp_per_capta, 4L) / shift(gdp_per_capta, 8L) * shift(gdp_per_capta, 4L), gdp_per_capta)]
1 Like

Cool! Thanks a lot! Didn't know this function Shift

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.

Thanks! This helps a lot.