I have updated the dplyr and getting this error now
Error in UseMethod("nest_by") :
no applicable method for 'nest_by' applied to an object of class "function"
I have updated the dplyr and getting this error now
Error in UseMethod("nest_by") :
no applicable method for 'nest_by' applied to an object of class "function"
mtcars |>
nest_by(cyl)
Can you run this ?
yes i am able to run this
mtcars |>
cyl data
<list<tibble[,10]>>
1 4 [11 × 10]
2 6 [7 × 10]
3 8 [14 × 10]
Can you provde an example of the code you are trying that gives you an error ?
groupLM <- sample|>
nest_by(bank_year) |>
mutate(lm_model = list(lm(y ~ x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+x11+x12+x13+x14, d = sample)))
this is a good example of a general lesson; choosing good names for our objects; and preferring names that don't clash with base R function names.
base::sample is a function; if you have a data.frame related to some sample, consider names like sample_df etc.
The code is working fine but the results are identical. I belive the result of single bank-year combination in copied in all regression.
library(dplyr)
library(tidyverse)
library(broom)
data_5 <- read.csv("data_sample.csv")y <- data_5$nse_returns
x1 <- data_5$auto
x2 <- data_5$consumer_durables
x3 <- data_5$FMCG
x4 <- data_5$healthcare
x5 <- data_5$IT
x6 <- data_5$media
x7 <- data_5$metal
x8 <- data_5$oil_gas
x9 <- data_5$pharma
x10 <- data_5$reality
x11 <- data_5$finance
x12 <- data_5$Mkt.RF
x13 <- data_5$SMB
x14 <- data_5$HML
groupLM <- data_5 |>
groupLM
bank_year data lm_model
<list<tibble[,18]>>
1 ALD2018 [246 × 18]
2 ALD2019 [244 × 18]
3 ALD2020 [55 × 18]
4 ANDHRA2018 [246 × 18]
5 ANDHRA2019 [244 × 18]
6 ANDHRA2020 [55 × 18]
7 AUSF2018 [246 × 18]
8 AUSF2019 [244 × 18]
9 AUSF2020 [250 × 18]
10 AUSF2021 [248 × 18]
print(n = ...)
to see more rowsgroupLM |> reframe(glance(lm_model))
bank_year r.squ…¹ adj.r…² sigma stati…³ p.value df logLik AIC BIC devia…⁴ df.re…⁵ nobs
1 ALD2018 0.0939 0.0936 0.0278 321. 0 14 93823. -1.88e5 -1.87e5 33.5 43343 43358
2 ALD2019 0.0939 0.0936 0.0278 321. 0 14 93823. -1.88e5 -1.87e5 33.5 43343 43358
3 ALD2020 0.0939 0.0936 0.0278 321. 0 14 93823. -1.88e5 -1.87e5 33.5 43343 43358
4 ANDHRA2018 0.0939 0.0936 0.0278 321. 0 14 93823. -1.88e5 -1.87e5 33.5 43343 43358
5 ANDHRA2019 0.0939 0.0936 0.0278 321. 0 14 93823. -1.88e5 -1.87e5 33.5 43343 43358
6 ANDHRA2020 0.0939 0.0936 0.0278 321. 0 14 93823. -1.88e5 -1.87e5 33.5 43343 43358
7 AUSF2018 0.0939 0.0936 0.0278 321. 0 14 93823. -1.88e5 -1.87e5 33.5 43343 43358
8 AUSF2019 0.0939 0.0936 0.0278 321. 0 14 93823. -1.88e5 -1.87e5 33.5 43343 43358
9 AUSF2020 0.0939 0.0936 0.0278 321. 0 14 93823. -1.88e5 -1.87e5 33.5 43343 43358
10 AUSF2021 0.0939 0.0936 0.0278 321. 0 14 93823. -1.88e5 -1.87e5 33.5 43343 43358
print(n = ...)
to see more rowsPlease format your post.
My advice is to think about the param that lm takes to establish the data it should use. If the nest operation produced an appropriate table and had it in a list column called data , then its that that should be used, and certainly not the entire unnested dataset (data_5)
I gave similar recommendation when do
was discussed.
> data_5 <- read.csv("data_sample.csv")
> y <- data_5$nse_returns
> x1 <- data_5$auto
> x2 <- data_5$consumer_durables
> x3 <- data_5$FMCG
> x4 <- data_5$healthcare
> x5 <- data_5$IT
> x6 <- data_5$media
> x7 <- data_5$metal
> x8 <- data_5$oil_gas
> x9 <- data_5$pharma
> x10 <- data_5$reality
> x11 <- data_5$finance
> x12 <- data_5$Mkt.RF
> x13 <- data_5$SMB
> x14 <- data_5$HML
> groupLM <- data_5 |>
+ nest_by(bank_year) |>
+ mutate(lm_model = list(lm(y ~ x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+x11+x12+x13+x14, d = data_5)))
> groupLM
# A tibble: 196 × 3
# Rowwise: bank_year
bank_year data lm_model
<chr> <list<tibble[,18]>> <list>
1 ALD2018 [246 × 18] <lm>
2 ALD2019 [244 × 18] <lm>
3 ALD2020 [55 × 18] <lm>
4 ANDHRA2018 [246 × 18] <lm>
5 ANDHRA2019 [244 × 18] <lm>
6 ANDHRA2020 [55 × 18] <lm>
7 AUSF2018 [246 × 18] <lm>
8 AUSF2019 [244 × 18] <lm>
9 AUSF2020 [250 × 18] <lm>
10 AUSF2021 [248 × 18] <lm>
# … with 186 more rows
# ℹ Use `print(n = ...)` to see more rows
> groupLM |> reframe(glance(lm_model))
# A tibble: 196 × 13
bank_year r.squ…¹ adj.r…² sigma stati…³ p.value df logLik AIC BIC devia…⁴ df.re…⁵ nobs
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int>
1 ALD2018 0.0939 0.0936 0.0278 321. 0 14 93823. -1.88e5 -1.87e5 33.5 43343 43358
2 ALD2019 0.0939 0.0936 0.0278 321. 0 14 93823. -1.88e5 -1.87e5 33.5 43343 43358
3 ALD2020 0.0939 0.0936 0.0278 321. 0 14 93823. -1.88e5 -1.87e5 33.5 43343 43358
4 ANDHRA2018 0.0939 0.0936 0.0278 321. 0 14 93823. -1.88e5 -1.87e5 33.5 43343 43358
5 ANDHRA2019 0.0939 0.0936 0.0278 321. 0 14 93823. -1.88e5 -1.87e5 33.5 43343 43358
6 ANDHRA2020 0.0939 0.0936 0.0278 321. 0 14 93823. -1.88e5 -1.87e5 33.5 43343 43358
7 AUSF2018 0.0939 0.0936 0.0278 321. 0 14 93823. -1.88e5 -1.87e5 33.5 43343 43358
8 AUSF2019 0.0939 0.0936 0.0278 321. 0 14 93823. -1.88e5 -1.87e5 33.5 43343 43358
9 AUSF2020 0.0939 0.0936 0.0278 321. 0 14 93823. -1.88e5 -1.87e5 33.5 43343 43358
10 AUSF2021 0.0939 0.0936 0.0278 321. 0 14 93823. -1.88e5 -1.87e5 33.5 43343 43358
# … with 186 more rows, and abbreviated variable names ¹r.squared, ²adj.r.squared, ³statistic,
# ⁴deviance, ⁵df.residual
# ℹ Use `print(n = ...)` to see more rows
> groupLM |> reframe(tidy(lm_model))
# A tibble: 2,940 × 6
bank_year term estimate std.error statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 ALD2018 (Intercept) 1.00 0.000134 7479. 0
2 ALD2018 x1 0.000887 0.000161 5.51 3.52e- 8
3 ALD2018 x2 0.000728 0.000168 4.33 1.51e- 5
4 ALD2018 x3 0.000531 0.000208 2.56 1.06e- 2
5 ALD2018 x4 0.00116 0.000634 1.82 6.86e- 2
6 ALD2018 x5 0.000116 0.000181 0.639 5.23e- 1
7 ALD2018 x6 0.00144 0.0000893 16.2 1.05e-58
8 ALD2018 x7 0.000675 0.000107 6.33 2.40e-10
9 ALD2018 x8 0.00289 0.000185 15.7 3.81e-55
10 ALD2018 x9 0.000389 0.000556 0.699 4.85e- 1
# … with 2,930 more rows
# ℹ Use `print(n = ...)` to see more rows
This is just my opinion but to me, without getting extra context from you that would explain/justify/motivate this; this stuff seems both self-defeating; and pointless extra work ?
Practically; the negative impact of having done this is that given these (x1-x14) things dont exist in the data_5 that you nest. so when they appear in your lm formula; lm is possibly too smart for its own good and goes directly to the objects you named out (y, x1,x2) and so its no longer possibly data driven by any nesting; and you have persisted in repeating to pass data_5 as a d= param, when I've told you two previous times that this does not work and should be the product of the nest...
question 1) do you have a requirement to hide the actual variable names and sub them for non-descriptive names such as x1-x14 ?
if you do we can talk about good approaches; but I would guess that you dont ...
There is no requirement to hide the actual variable names. I didn't used the actual names, not to make the model messy. I am not expert in R, I dont know many protocol. I am really sorry in case my silly mistakes are displeasing you. If i use the actual names will it work?
The actual names are y is the nse returns and x is the (auto to HML)
date | name | year | bank_year | nse_returns | auto | consumer_durables | FMCG | healthcare | IT | media | metal | oil_gas | pharma | reality | finance | Mkt.RF | SMB | HML | RF |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
01-01-2008 | ALD | 2008 | ALD2008 | 1.09 | 0.871 | -0.528 | 2.199 | -0.097 | -1.308 | 1.599 | 0.195 | -0.26 | -0.255 | 2.907 | 0.234 | 0.02 | 0.01 | -0.01 | 0.01 |
01-02-2008 | ALD | 2008 | ALD2008 | 1.02 | 1.611 | -2.091 | 0.07 | -0.85 | 2.845 | -5.311 | -0.997 | -0.23 | -1.14 | -4.486 | -2.192 | 1.26 | -0.05 | -0.14 | 0.01 |
01-04-2008 | ALD | 2008 | ALD2008 | 1 | -0.812 | 0.94 | 1.871 | -0.649 | -1.065 | -0.586 | -1.967 | 2.471 | -0.979 | -0.638 | -1.127 | 1.95 | -1.41 | 0.19 | 0.01 |
01-07-2008 | ALD | 2008 | ALD2008 | 0.96 | -1.906 | -0.632 | -1.026 | -0.429 | 0.44 | -2.274 | -0.653 | -0.352 | -0.604 | -1.537 | -1.366 | -0.66 | 0.09 | -0.16 | 0.01 |
01-08-2008 | ALD | 2008 | ALD2008 | 1.01 | -1.92 | -0.546 | -1.826 | -0.348 | 1.442 | -0.589 | 0.354 | 0.844 | 0.242 | -1.053 | 1.491 | -1.09 | 0.51 | 0.36 | 0.01 |
I've attempted to go through and apply EconProfs approach to what we understand of your data, and model needs. I've tried to be more explicit than is needed; by renaming the results of the nest_by and using that name as appropriate within lm()
data_5 <- read.csv("data_sample.csv")
groupLM <- data_5 |>
nest_by(bank_year,
.key = "nested_data") |>
mutate(lm_model = list(lm(nse_returns ~auto +
consumer_durables +
FMCG +
healthcare +
IT +
media +
metal +
oil_gas +
pharma +
reality +
finance +
Mkt.RF +
SMB +
HML, d = nested_data)))
groupLM |> reframe(glance(lm_model))
groupLM |> reframe(tidy(lm_model))
Thank you very very much this worked.
Did you post in the wrong thread ?
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.