Having Issues with Loop in IV Regression

Tprado · January 15, 2023, 4:48pm

I need to return all the stats from iv regression in a data.frame however this should be based on each ID once I am running 1 regression per product.

I have in this example 6 different IDS.

List

sku_list = unique(df_log$Sku)
sku_list

IV Equation

ivreg(Y ~ X + W | W + Z)

Run the model for each SKU in data frame

regressions <-c()
for(i in sku_list){

regressions[i] <- estimatr::iv_robust(df_log[df_log$Sku==i,]$Log_TotalSold ~
df_log[df_log$Sku==i,]$Log_Avprice |
df_log[df_log$Sku==i,]$Log_cost,data = df_log)

}

Getting the list of coefficients

m_list <- list(IV = regressions)
m_list

williaml · January 16, 2023, 12:46am

HI, you could try using the broom package to get the results in a data frame.

Tidy a(n) ivreg object — tidy.ivreg • broom (tidymodels.org)

In terms of the loop, it is difficult to help without a reproducible example/.

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

Tprado · January 16, 2023, 12:28pm

Hi Willaiml unfortunately did not work.

How can I share my database to you ?

This is my entire code if I run just one regression I can get all the stats for the model.
However I need to run the ivreg based on product not one for all the different products.

pacotes <- c("kableExtra","reshape2","tidyverse","ggsci","gridExtra","lubridate",
"data.table","estimater","modelsummary","broom","lmtest","OIdata","sandwich","doBy","openintro","ivpack","arm","stargazer","jtools","plm","car","caret","carData","dplyr","AER")

Install all the needed packages to deploy the model

if(sum(as.numeric(!pacotes %in% installed.packages()))!= 0){
instalador <- pacotes[!pacotes %in% installed.packages()]
for (i in 1: length(instalador)) {
install.packages(instalador,dependencies = T)
break()}
sapply(pacotes, require, character = T)
} else {
sapply(pacotes,require,character = T)
}

Bringing the csv to R

dados <- read.csv("C:\Users\thale\Desktop\20231101_Elasticity.csv",
sep = ";")
str(dados) ## Verify the structure of the dataset
View(head(dados)) ## Check all the basket based on the dataset

Working on some cleaning

dados2 <- dados[,!names(dados) %in% c("X", "X.1")]
dados2 <- dados2 %>%
mutate(SalesDate = as.Date(SalesDate, "%d/%m/%Y"))
str(dados2)
dim(dados2)

Remove NAS from the dataset

dados2 <- dados2 %>% drop_na()
df_log <- df_log %>% drop_na()

Create log data.frame

df_log <- data.frame(Sales_date=dados2$SalesDate,
Sku=dados2$Sku,
Log_Revenue=log(dados2$Revenue),
Log_TotalSold=log(dados2$TotalSold),
Log_Avprice=log(dados2$AvPrice),
Log_inflation=log(dados2$Inflation),
Log_cost=log(dados2$Cost)
)

Pricing variance

pricevar <- ggplot(dados2,aes(x = SalesDate, y = AvPrice, group = Sku)) +
geom_line(aes(color = Sku)) +
theme_minimal() +
ylim(0,320) +
scale_x_date(date_breaks = "2 months") +
labs(title = "Price Variance over time")
pricevar

Price data exploration 2

df_pricestr = dados2 %>%
group_by(Sku) %>%
summarise(min_price = min(AvPrice),
mean_price = mean(AvPrice),
max_price = max(AvPrice))
df_pricestr_l = melt(df_pricestr, id = 'Sku')

Data visuals for min, max, av price.

p_price_arch = ggplot(df_pricestr_l, aes(x = Sku, y = value, group = variable)) +
geom_point(aes(color = variable), size = 4, shape = 20) +
theme_minimal() +
theme(axis.text.x = element_text(size = 10),
axis.text.y = element_text(size = 10),
legend.text = element_text(size = 10)) +
scale_color_npg(breaks=c("max_price","mean_price","min_price")) +
labs(title = 'Price Architecture',
x = "",
y = 'Price Structure') +
geom_text(aes(label=round(value, 2)),hjust=-0.3, vjust=-1)
p_price_arch

Running IV try to resolve the issues for endogeinity

Principais razoes sao vies de variavel omitida e simultanedade

Com endogeneidade no modelo os Betas acabam nao sendo bons estimadores

Simultaneidade ( y = a + b1 * x + uerro) - Y impacta X o que na verdade deveria ser o X e o termo de erro U impactando Y

vies variavel omitida ( y = a + b1 * x + u) - Termo de erro(variavel omitida no termo) impacta nosso X sendo assim no B1 nao estara capturando apenas o impacto de x em y e sim o valor do termo de erro tambem

Y = outcome variable (log quantity)

X = Endogenous variable (Log price)

Z = Instruments (Inflation, Cost)

W = Any exogenous variables not including in the instrument

List

sku_list = unique(df_log$Sku)
sku_list

IV Equation

ivreg(Y ~ X + W | W + Z)

Run the model for each SKU in data frame

regressions <-c()
for(i in sku_list){

regressions[i] <- ivreg(df_log[df_log$Sku==i,]$Log_TotalSold ~
df_log[df_log$Sku==i,]$Log_Avprice |
df_log[df_log$Sku==i,]$Log_cost,data = df_log)

}

Getting the list of coefficients

m_list <- list(IV = regressions)
m_list

nirgrahamuk · January 16, 2023, 2:21pm

df_log |> split(~Sku) |>
  map(\(x) ivreg(Log_TotalSold ~
              Log_Avprice |
              Log_cost,data = x))

Tprado · January 17, 2023, 12:33pm

Did not work

Output.

nirgrahamuk · January 17, 2023, 4:04pm

your output tells me you didnt use my code; perhaps you adopted my use of split , but didn't rewrite the rest of it to work ?
the $ signs are a give away ... also the presence of df_log at all...

system · February 7, 2023, 4:04pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.