Imputing with last known value

Hi RStudio Community!!

I'm trying to impute values for a column by using the last known value. Check this toy example out:

library(tidyverse)

my_tibble <- tibble(
  index = seq(1:10),
  quantity = c(100, 200, NA, NA, 300, NA, 400, 400, 500, NA)
)

my_tibble
# A tibble: 10 x 2
   index quantity
   <int>    <dbl>
 1     1      100
 2     2      200
 3     3       NA
 4     4       NA
 5     5      300
 6     6       NA
 7     7      400
 8     8      400
 9     9      500
10    10       NA

If we check for missing values, we find that values for index = c(3, 4, 6, 10) are NAs:

my_tibble %>% 
  filter(is.na(quantity))

# A tibble: 4 x 2
  index quantity
  <int>    <dbl>
1     3       NA
2     4       NA
3     6       NA
4    10       NA

Now... what I've been trying to do (but failed massively) is to impute the missing values with the last known value. Thus the output would be:

# A tibble: 10 x 2
   index quantity
   <int>    <dbl>
 1     1       100
 2     2       200
 3     3       200           # imputed using quantity at index 2
 4     4       200           # imputed using quantity at index 2
 5     5       300
 6     6       300           # imputed using quantity at index 5
 7     7       400
 8     8       400
 9     9       500
10    10     500            # imputed using quantity at index 9

I've been stuck with this for over a week now, so I surrendered and decided to ask around.

I thank you in advance. :smiley:

Best,
Alexis

usually we try to avoid for loops, but this is a use case where its justified.

for(i in seq_along(my_tibble$quantity)){
  if(i>1){
    if(is.na(my_tibble$quantity[i])){
      my_tibble$quantity[i] <- my_tibble$quantity[i-1]
    }
  }
}

This is how I would have solved it.

library(tidyr)

my_tibble <- tibble(
  index = seq(1:10),
  quantity = c(100, 200, NA, NA, 300, NA, 400, 400, 500, NA)
)



my_tibble %>% fill(quantity)
#> # A tibble: 10 x 2
#>    index quantity
#>    <int>    <dbl>
#>  1     1      100
#>  2     2      200
#>  3     3      200
#>  4     4      200
#>  5     5      300
#>  6     6      300
#>  7     7      400
#>  8     8      400
#>  9     9      500
#> 10    10      500

Created on 2020-10-14 by the reprex package (v0.3.0)

2 Likes

nice ! thats a handy tidyr function

1 Like

Man, the tidyr package is super amazing. I definitely need to dive deeper into it. Thank you so much for the super fast answer!!

Best,
Alexis

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.