I am looking for a funciton in R similar to ipolate function in STATA (https://www.stata.com/manuals13/dipolate.pdf)

seems like that function simply fits an `lm`

() and then when there are missing values takes the filling values from the predicted output of the lm.

Do you have experience with R's `lm`

and `predict`

?

x | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 |
---|---|---|---|---|---|---|---|---|---|---|

y | 53.142662 | 53.565829 | 54.623546 | 56.682212 | 59.495817 | 43.86624 | 53.134923 | 22.206799 | ||

y1 | 53.142662 | 53.565829 | 54.623546 | 56.682212 | 59.495817 | 43.86624 | 53.134923 | 37.670861 | 22.206799 | 6.7427376 |

y has a missing values, after using ipolate in stata, we get y1.

I really appreciate if you can show how to do so in R

I'm not a stata user but the documentation for the function you linked to implied to me that the the values would go vertically rather than horizontally as you illustrate them here ?

Assuming you had vertical arranged data you can follow my example

```
#example data
(spoiled <- structure(list(x = 1:10, y = c(
1.41, 3.71, NA, 8.31, 10.61, 12.91,
NA, 5, 19.81, 22.11
)), row.names = c(NA, -10L), class = "data.frame"))
lm_1 <- lm(y ~ x, data = spoiled)
fixed <- spoiled
fixed$y_lm <- predict(lm_1, newdata = spoiled)
fixed$y_fin <- ifelse(is.na(fixed$y), fixed$y_lm, fixed$y)
fixed
subset(fixed,
select = c(x, y_fin)
)
```

I shard the data horizontally because i copied from Excel.

However when follow your instruction using my data.

```
(spoiled <- structure(list(x = 2010:2019, y = c(53.142662, 53.565829, 54.623546, 56.682212, 59.495817, 43.86624, 53.134923, NA, 22.206799, NA)), row.names = c(NA, -10L), class = "data.frame"))
lm_1 <- lm(y ~ x, data = spoiled)
fixed <- spoiled
fixed$y_lm <- predict(lm_1, newdata = spoiled)
fixed$y_fin <- ifelse(is.na(fixed$y), fixed$y_lm, fixed$y)
fixed
subset(fixed,
select = c(x, y_fin)
)
```

i am not geting the same result like iploate funciton in stata.

ie

y_fin should be : c(53.142662, 53.565829, 54.623546, 56.682212, 59.495817, 43.86624, 53.134923, **37.670861**, 22.206799, **6.7427376**)

Here is my implementation:

```
library(purrr)
closest_points <- function(x1,y1){
diffs <- abs(x1 - y1)
names(diffs)<-y1
sort(as.integer(names(head(sort(diffs),2))))
}
new_point <- function(x,x0,y0,x1,y1){
((y1-y0)/(x1-x0))*(x-x0)+y0
}
ipolate <- function(x,y){
missings_to_fill <- which(is.na(y))
filled_points <- setdiff(seq_along(y),missings_to_fill)
step1 <- map(missings_to_fill,
~closest_points(.x,filled_points))
names(step1) <- missings_to_fill
step1
step2 <- imap(step1,
~{
xlocal <- as.integer(.y)
x0 <- .x[1]
x1 <- .x[2]
y0 <- y[x0]
y1 <- y[x1]
new_point(x = xlocal,
x0=x0,
x1=x1,
y0=y0,
y1=y1)
})
y[missings_to_fill] <- unlist(step2)
y
}
```

```
(spoiled <- structure(list(x = 2010:2019, y = c(53.142662, 53.565829, 54.623546, 56.682212, 59.495817, 43.86624, 53.134923, NA, 22.206799, NA)), row.names = c(NA, -10L), class = "data.frame"))
fixed <- spoiled
fixed$y <- ipolate(spoiled$x,spoiled$y)
fixed
```

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.