Ipolate function in R

I am looking for a funciton in R similar to ipolate function in STATA (https://www.stata.com/manuals13/dipolate.pdf)

seems like that function simply fits an lm() and then when there are missing values takes the filling values from the predicted output of the lm.
Do you have experience with R's lm and predict ?

x 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
y 53.142662 53.565829 54.623546 56.682212 59.495817 43.86624 53.134923 22.206799
y1 53.142662 53.565829 54.623546 56.682212 59.495817 43.86624 53.134923 37.670861 22.206799 6.7427376

y has a missing values, after using ipolate in stata, we get y1.
I really appreciate if you can show how to do so in R

I'm not a stata user but the documentation for the function you linked to implied to me that the the values would go vertically rather than horizontally as you illustrate them here ?

Assuming you had vertical arranged data you can follow my example


#example data
(spoiled <- structure(list(x = 1:10, y = c(
  1.41, 3.71, NA, 8.31, 10.61, 12.91,
  NA, 5, 19.81, 22.11
)), row.names = c(NA, -10L), class = "data.frame"))


lm_1 <- lm(y ~ x, data = spoiled)

fixed <- spoiled
fixed$y_lm <- predict(lm_1, newdata = spoiled)
fixed$y_fin <- ifelse(is.na(fixed$y), fixed$y_lm, fixed$y)
fixed

subset(fixed,
  select = c(x, y_fin)
)

I shard the data horizontally because i copied from Excel.
However when follow your instruction using my data.

(spoiled <- structure(list(x = 2010:2019, y = c(53.142662, 53.565829, 54.623546, 56.682212, 59.495817, 43.86624, 53.134923, NA, 22.206799, NA)), row.names = c(NA, -10L), class = "data.frame"))
lm_1 <- lm(y ~ x, data = spoiled)

fixed <- spoiled
fixed$y_lm <- predict(lm_1, newdata = spoiled)
fixed$y_fin <- ifelse(is.na(fixed$y), fixed$y_lm, fixed$y)
fixed

subset(fixed,
       select = c(x, y_fin)
)

i am not geting the same result like iploate funciton in stata.
ie
y_fin should be : c(53.142662, 53.565829, 54.623546, 56.682212, 59.495817, 43.86624, 53.134923, 37.670861, 22.206799, 6.7427376)

Here is my implementation:


library(purrr)

closest_points <- function(x1,y1){
diffs <- abs(x1 - y1)
names(diffs)<-y1
sort(as.integer(names(head(sort(diffs),2))))
}

new_point <- function(x,x0,y0,x1,y1){
  ((y1-y0)/(x1-x0))*(x-x0)+y0
}

ipolate <- function(x,y){
  missings_to_fill <- which(is.na(y))
  filled_points <- setdiff(seq_along(y),missings_to_fill)

  step1 <- map(missings_to_fill,
      ~closest_points(.x,filled_points)) 
  names(step1) <- missings_to_fill
  step1
  
  step2 <- imap(step1,
      ~{
        xlocal <- as.integer(.y)
        x0 <- .x[1]
        x1 <- .x[2]
        y0 <- y[x0]
        y1 <- y[x1]
        new_point(x = xlocal,
                  x0=x0,
                  x1=x1,
                  y0=y0,
                  y1=y1)
        })
  y[missings_to_fill] <- unlist(step2)
  y
}
(spoiled <- structure(list(x = 2010:2019, y = c(53.142662, 53.565829, 54.623546, 56.682212, 59.495817, 43.86624, 53.134923, NA, 22.206799, NA)), row.names = c(NA, -10L), class = "data.frame"))

fixed <- spoiled
fixed$y <- ipolate(spoiled$x,spoiled$y)
fixed
1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.