I have dynamically included the results of a regression into a dataframe - so far so good.
I can use predict from the dataframe so regressions have worked
I now want to use predict as part of a pipeline.
For each of the four lines in dataAndModel below I want to find the predicted value from the associated model and the single data point x on each line.
There are two issues:
passing the regression equation so it is recognised in the pipeline by predict
passing a single datapoint instead of a dataframe
I am seeking an output looking like
x forecastedValueofXforGivenModel
3 13
4 14
5 15
6 16
Thanks in advance for your comments
library(dplyr)
library(stats)
theData <- data.frame(type=c(1,1,2,2),x=c(1,2,3,4), y=c(11,12,13,14))
regressions <- theData %>%
group_by(type) %>%
do(myModel=lm(y ~ x, data=.))
regressions #1 <S3: lm> and 2 <S3: lm>
evaluate <- data.frame(type=c(1,1,2,2), x=c(3,4,5,6))
fakeX <- data.frame(x=c(3,4,5,6))
dataAndModel <- evaluate %>%
merge(x=.,y=regressions)
{ # works so regression is working
predict(dataAndModel[[1,"myModel"]],fakeX)
}
# now i want to use predict as part of a pipe
dataAndModel %>% # does not work - given x is not a dataframe not expected to work
mutate(yEst = predict(myModel,x))
dataAndModel %>% # does not work
mutate(yEst = predict(myModel,fakeX))
I am using a slightly different approach with the {purrr} and {tidyr} packages here but the final result is what you expect. Do not hesitate to ask questions if you have any:
# Load packages ----
library(dplyr)
library(purrr)
library(tidyr)
# Create dataset (+ fake dataset) ----
theData <- data.frame(
type = c(1, 1, 2, 2),
x = c(1, 2, 3, 4),
y = c(11, 12, 13, 14),
fakeX = c(3, 4, 5, 6)
)
# Nest the data by type ----
nestedData <- theData %>%
group_by(type) %>%
nest(data = c(y, x), fakeX = fakeX) %>%
ungroup()
# Run model for each type ----
nestedData <- nestedData %>%
mutate(
model = map(.x = data, ~ lm(y ~ x, data = .x))
)
# Predict values using fake data ----
final_nested <- nestedData %>%
mutate(
yEst = map2(
.x = model,
.y = fakeX,
.f = ~ predict(object = .x, newData = .y)
)
)
final_nested
# A tibble: 2 × 5
type data fakeX model yEst
<dbl> <list> <list> <list> <list>
1 1 <tibble [2 × 2]> <tibble [2 × 1]> <lm> <dbl [2]>
2 2 <tibble [2 × 2]> <tibble [2 × 1]> <lm> <dbl [2]>
# Select and unnest numeric columns ----
final_nested %>%
select(type, data, fakeX, yEst) %>%
unnest(cols = -type)
# A tibble: 4 × 5
type y x fakeX yEst
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 11 1 3 11
2 1 12 2 4 12
3 2 13 3 5 13
4 2 14 4 6 14
I have noticed that the results were incorrect and have learnt a lot about how purrr works.
The code up to and including the regression worked. The code around final_nested needed some changes as follows:
(1) the syntax was newdata not newData
(2) the predict was looking for a x name down at the field level so earlier I created a nestedData2 where the nest had an internal name of "x"
The question I had with -type (bottom of code) has now gone away.
regards
Boffin
library(dplyr)
library(purrr)
library(tidyr)
# Create dataset (+ fake dataset) ----
theData <- data.frame(
type = c(1, 1, 2, 2),
x = c(1, 2, 3, 4),
y = c(11, 12, 13, 14),
fakeX = c(3, 4, 5, 6)
)
# Nest the data by type ----
nestedData1 <- theData %>%
select(type,y,x) %>%
group_by(type) %>%
nest(data = c(y, x)) %>% # x=fakex removed need to have the fakeX = 'x' else predict will not work
ungroup()
nestedData2 <- theData %>%
select(type,fakeX) %>%
group_by(type) %>%
rename(x=fakeX ) %>% # rename X
nest(fakeX = x )%>% # x=fakex removed need to have the fakeX = 'x' else predict will not work
ungroup()
nestedData <- merge(nestedData1,nestedData2) # 3 objects in dataset are type, fakeX, data
# Run model for each type ----
nestedData <- nestedData %>%
mutate(
model = map(.x = data, ~ lm(y ~ x, data = .x))
)
## the model moans as it is a very poor (meaningless) statistically - not a problem if i had many data points
# Predict values using fake data ----
final_nested <- nestedData %>%
mutate(
yEst = map2(
.x = model,
.y = fakeX, # this line passes 'x' into the model
.f = ~ predict(object = .x, newdata = .y) # newdata not newData
)
)
final_nested
# check out the modelling
summary(nestedData[[1,"model"]]) # works 1 line
summary(nestedData[[2,"model"]]) # works 1 line
predict(object=nestedData[[1,"model"]],newdata = data.frame(x=6)) # 6*1 + 10 = 16
# Select and unnest numeric columns ----
final_nested %>%
select(type, fakeX, yEst) %>%
unnest(cols =-type) # now works
You are right regarding the typo in my code. Indeed, the argument is newdata and not newData. And glad you have been able to make it work to your liking. You should mark your own code as the solution.