Ipolate function in R

melgoussi · November 10, 2021, 12:07pm

I am looking for a funciton in R similar to ipolate function in STATA (https://www.stata.com/manuals13/dipolate.pdf)

nirgrahamuk · November 10, 2021, 12:54pm

seems like that function simply fits an lm() and then when there are missing values takes the filling values from the predicted output of the lm.
Do you have experience with R's lm and predict ?

melgoussi · November 10, 2021, 3:40pm

x	2010	2011	2012	2013	2014	2015	2016	2017	2018	2019
y	53.142662	53.565829	54.623546	56.682212	59.495817	43.86624	53.134923		22.206799
y1	53.142662	53.565829	54.623546	56.682212	59.495817	43.86624	53.134923	37.670861	22.206799	6.7427376

y has a missing values, after using ipolate in stata, we get y1.
I really appreciate if you can show how to do so in R

nirgrahamuk · November 10, 2021, 3:51pm

I'm not a stata user but the documentation for the function you linked to implied to me that the the values would go vertically rather than horizontally as you illustrate them here ?

Assuming you had vertical arranged data you can follow my example


#example data
(spoiled <- structure(list(x = 1:10, y = c(
  1.41, 3.71, NA, 8.31, 10.61, 12.91,
  NA, 5, 19.81, 22.11
)), row.names = c(NA, -10L), class = "data.frame"))


lm_1 <- lm(y ~ x, data = spoiled)

fixed <- spoiled
fixed$y_lm <- predict(lm_1, newdata = spoiled)
fixed$y_fin <- ifelse(is.na(fixed$y), fixed$y_lm, fixed$y)
fixed

subset(fixed,
  select = c(x, y_fin)
)

melgoussi · November 11, 2021, 12:49pm

I shard the data horizontally because i copied from Excel.
However when follow your instruction using my data.

(spoiled <- structure(list(x = 2010:2019, y = c(53.142662, 53.565829, 54.623546, 56.682212, 59.495817, 43.86624, 53.134923, NA, 22.206799, NA)), row.names = c(NA, -10L), class = "data.frame"))
lm_1 <- lm(y ~ x, data = spoiled)

fixed <- spoiled
fixed$y_lm <- predict(lm_1, newdata = spoiled)
fixed$y_fin <- ifelse(is.na(fixed$y), fixed$y_lm, fixed$y)
fixed

subset(fixed,
       select = c(x, y_fin)
)

i am not geting the same result like iploate funciton in stata.
ie
y_fin should be : c(53.142662, 53.565829, 54.623546, 56.682212, 59.495817, 43.86624, 53.134923, 37.670861, 22.206799, 6.7427376)

nirgrahamuk · November 11, 2021, 1:20pm

Here is my implementation:


library(purrr)

closest_points <- function(x1,y1){
diffs <- abs(x1 - y1)
names(diffs)<-y1
sort(as.integer(names(head(sort(diffs),2))))
}

new_point <- function(x,x0,y0,x1,y1){
  ((y1-y0)/(x1-x0))*(x-x0)+y0
}

ipolate <- function(x,y){
  missings_to_fill <- which(is.na(y))
  filled_points <- setdiff(seq_along(y),missings_to_fill)

  step1 <- map(missings_to_fill,
      ~closest_points(.x,filled_points)) 
  names(step1) <- missings_to_fill
  step1
  
  step2 <- imap(step1,
      ~{
        xlocal <- as.integer(.y)
        x0 <- .x[1]
        x1 <- .x[2]
        y0 <- y[x0]
        y1 <- y[x1]
        new_point(x = xlocal,
                  x0=x0,
                  x1=x1,
                  y0=y0,
                  y1=y1)
        })
  y[missings_to_fill] <- unlist(step2)
  y
}

(spoiled <- structure(list(x = 2010:2019, y = c(53.142662, 53.565829, 54.623546, 56.682212, 59.495817, 43.86624, 53.134923, NA, 22.206799, NA)), row.names = c(NA, -10L), class = "data.frame"))

fixed <- spoiled
fixed$y <- ipolate(spoiled$x,spoiled$y)
fixed

system · December 2, 2021, 1:20pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.