I am looking for a funciton in R similar to ipolate function in STATA (https://www.stata.com/manuals13/dipolate.pdf)
seems like that function simply fits an lm
() and then when there are missing values takes the filling values from the predicted output of the lm.
Do you have experience with R's lm
and predict
?
x | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 |
---|---|---|---|---|---|---|---|---|---|---|
y | 53.142662 | 53.565829 | 54.623546 | 56.682212 | 59.495817 | 43.86624 | 53.134923 | 22.206799 | ||
y1 | 53.142662 | 53.565829 | 54.623546 | 56.682212 | 59.495817 | 43.86624 | 53.134923 | 37.670861 | 22.206799 | 6.7427376 |
y has a missing values, after using ipolate in stata, we get y1.
I really appreciate if you can show how to do so in R
I'm not a stata user but the documentation for the function you linked to implied to me that the the values would go vertically rather than horizontally as you illustrate them here ?
Assuming you had vertical arranged data you can follow my example
#example data
(spoiled <- structure(list(x = 1:10, y = c(
1.41, 3.71, NA, 8.31, 10.61, 12.91,
NA, 5, 19.81, 22.11
)), row.names = c(NA, -10L), class = "data.frame"))
lm_1 <- lm(y ~ x, data = spoiled)
fixed <- spoiled
fixed$y_lm <- predict(lm_1, newdata = spoiled)
fixed$y_fin <- ifelse(is.na(fixed$y), fixed$y_lm, fixed$y)
fixed
subset(fixed,
select = c(x, y_fin)
)
I shard the data horizontally because i copied from Excel.
However when follow your instruction using my data.
(spoiled <- structure(list(x = 2010:2019, y = c(53.142662, 53.565829, 54.623546, 56.682212, 59.495817, 43.86624, 53.134923, NA, 22.206799, NA)), row.names = c(NA, -10L), class = "data.frame"))
lm_1 <- lm(y ~ x, data = spoiled)
fixed <- spoiled
fixed$y_lm <- predict(lm_1, newdata = spoiled)
fixed$y_fin <- ifelse(is.na(fixed$y), fixed$y_lm, fixed$y)
fixed
subset(fixed,
select = c(x, y_fin)
)
i am not geting the same result like iploate funciton in stata.
ie
y_fin should be : c(53.142662, 53.565829, 54.623546, 56.682212, 59.495817, 43.86624, 53.134923, 37.670861, 22.206799, 6.7427376)
Here is my implementation:
library(purrr)
closest_points <- function(x1,y1){
diffs <- abs(x1 - y1)
names(diffs)<-y1
sort(as.integer(names(head(sort(diffs),2))))
}
new_point <- function(x,x0,y0,x1,y1){
((y1-y0)/(x1-x0))*(x-x0)+y0
}
ipolate <- function(x,y){
missings_to_fill <- which(is.na(y))
filled_points <- setdiff(seq_along(y),missings_to_fill)
step1 <- map(missings_to_fill,
~closest_points(.x,filled_points))
names(step1) <- missings_to_fill
step1
step2 <- imap(step1,
~{
xlocal <- as.integer(.y)
x0 <- .x[1]
x1 <- .x[2]
y0 <- y[x0]
y1 <- y[x1]
new_point(x = xlocal,
x0=x0,
x1=x1,
y0=y0,
y1=y1)
})
y[missings_to_fill] <- unlist(step2)
y
}
(spoiled <- structure(list(x = 2010:2019, y = c(53.142662, 53.565829, 54.623546, 56.682212, 59.495817, 43.86624, 53.134923, NA, 22.206799, NA)), row.names = c(NA, -10L), class = "data.frame"))
fixed <- spoiled
fixed$y <- ipolate(spoiled$x,spoiled$y)
fixed
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.