Did you read the article? It has been posted three times in the thread, yet you still haven't done it.
`datapasta::df_paste(head(NewData, 10)[, c('gvkey', 'datadate','cshoc','prccd')])`
data.frame(
gvkey = c(7041, 7041, 7041, 7041, 7041, 7041, 7041, 7041, 7041, 7041),
datadate = c(20000103,20000104,20000105,20000106,
20000111,20000112,20000113,20000114,20000117,20000118),
cshoc = c(4.2e+07,4.2e+07,4.2e+07,4.2e+07,4.2e+07,
4.2e+07,4.2e+07,4.2e+07,4.2e+07,4.2e+07),
prccd = c(3.86, 4.28, 4, 4.04, 4, 3.96, 3.92, 3.96, 4.06, 4.14)
)
Is this a good example?.. they are all the same gvkey, and all entries have unique datadates ?
NewData<- data.frame(
gvkey = c(7041, 7041, 7041, 7041, 7041, 7041, 7041, 7041, 7041, 7041),
datadate = c(20000103,20000104,20000105,20000106,
20000111,20000112,20000113,20000114,20000117,20000118),
cshoc = c(4.2e+07,4.2e+07,4.2e+07,4.2e+07,4.2e+07,
4.2e+07,4.2e+07,4.2e+07,4.2e+07,4.2e+07),
prccd = c(3.86, 4.28, 4, 4.04, 4, 3.96, 3.92, 3.96, 4.06, 4.14)
)
mutate(NewData,
firmMarket=cshoc*prccd)%>%
group_by(gvkey,datadate)%>%
mutate(TotalMarket=sum(firmMarket,na.rm=TRUE))%>%
mutate(weightage=firmMarket/TotalMarket)
# A tibble: 10 x 7
# Groups: gvkey, datadate [10]
gvkey datadate cshoc prccd firmMarket TotalMarket weightage
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 7041 20000103 42000000 3.86 162120000 162120000 1
2 7041 20000104 42000000 4.28 179760000 179760000 1
3 7041 20000105 42000000 4 168000000 168000000 1
4 7041 20000106 42000000 4.04 169680000 169680000 1
5 7041 20000111 42000000 4 168000000 168000000 1
6 7041 20000112 42000000 3.96 166320000 166320000 1
7 7041 20000113 42000000 3.92 164640000 164640000 1
8 7041 20000114 42000000 3.96 166320000 166320000 1
9 7041 20000117 42000000 4.06 170520000. 170520000. 1
10 7041 20000118 42000000 4.14 173880000 173880000 1
There are more than 1000 gvkeys with different entries in the data set and have unique datadate. But in this example, if you see the value for TotlaMaket is the same as firmMarket. And when I run your code
mutate(NewData,
firmMarket=cshoc*prccd)%>%
group_by(gvkey,datadate)%>%
mutate(TotalMarket=sum(firmMarket,na.rm=TRUE))%>%
mutate(weightage=firmMarket/TotalMarket)%>%
mutate(PriceWeightage=weightage*prccd)%>%
group_by(gvkey,datadate)%>%
mutate(MarketIndex=sum(PriceWeightage,na.rm = TRUE))
I get this output
# A tibble: 1,048,575 x 9
# Groups: gvkey, datadate [1,031,402]
gvkey datadate cshoc prccd firmMarket TotalMarket weightage PriceWeightage MarketIndex
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 7041 20000103 42000000 3.86 162120000 4.33e15 0.0000000375 0.000000145 17.7
2 7041 20000104 42000000 4.28 179760000 4.33e15 0.0000000415 0.000000178 17.7
3 7041 20000105 42000000 4 168000000 4.33e15 0.0000000388 0.000000155 17.7
4 7041 20000106 42000000 4.04 169680000 4.33e15 0.0000000392 0.000000158 17.7
5 7041 20000111 42000000 4 168000000 4.33e15 0.0000000388 0.000000155 17.7
6 7041 20000112 42000000 3.96 166320000 4.33e15 0.0000000384 0.000000152 17.7
7 7041 20000113 42000000 3.92 164640000 4.33e15 0.0000000380 0.000000149 17.7
8 7041 20000114 42000000 3.96 166320000 4.33e15 0.0000000384 0.000000152 17.7
9 7041 20000117 42000000 4.06 170520000. 4.33e15 0.0000000394 0.000000160 17.7
10 7041 20000118 42000000 4.14 173880000 4.33e15 0.0000000402 0.000000166 17.7
# ... with 1,048,565 more rows
I don't know what is missing to calculate TotalMarket values.
totalmarket is the same as full markey because its a summation by the groups gvkey AND datadate.
to not use datadate to split the groups simply remove it
mutate(NewData,
firmMarket=cshoc*prccd)%>%
group_by(gvkey)%>%
mutate(TotalMarket=sum(firmMarket,na.rm=TRUE))%>%
mutate(weightage=firmMarket/TotalMarket)
# A tibble: 10 x 7
# Groups: gvkey [1]
gvkey datadate cshoc prccd firmMarket TotalMarket weightage
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 7041 20000103 42000000 3.86 162120000 1689240000 0.0960
2 7041 20000104 42000000 4.28 179760000 1689240000 0.106
3 7041 20000105 42000000 4 168000000 1689240000 0.0995
4 7041 20000106 42000000 4.04 169680000 1689240000 0.100
5 7041 20000111 42000000 4 168000000 1689240000 0.0995
6 7041 20000112 42000000 3.96 166320000 1689240000 0.0985
7 7041 20000113 42000000 3.92 164640000 1689240000 0.0975
8 7041 20000114 42000000 3.96 166320000 1689240000 0.0985
9 7041 20000117 42000000 4.06 170520000. 1689240000 0.101
10 7041 20000118 42000000 4.14 173880000 1689240000 0.103
To make TotalMarket values more clear, please see the image. In this, we have different gvkeys with datadates and all data will have sum based on datadate. Also if you see a different gvkey with the same date have the same values. So I want TotlaMarket value should be the same for the same date for different gvkeys.
I hope I am able to make my question clear.
Thanks
you are saying you want the total to be the sum of firm values by date (not by gvkey)
mutate(NewData,
firmMarket=cshoc*prccd)%>%
group_by(datadate)%>%
mutate(TotalMarket=sum(firmMarket,na.rm=TRUE))%>%
mutate(weightage=firmMarket/TotalMarket)
No no, let see this way ,
if i do sum of firmMarket values based on gvkey and datadate and store them in TotlaMarket. So for all gvkey and datadate all values will be stored in TotalMarket and value will be same for same date with different gvkey.
This contradicts different gvkeys having the same TotalMarkey values on matching days...
Yes, want to keep same values for different gvkeys for matching days. Is it possible to do.!!!
Thanks
yes, I already showed you, but it seems you didnt understand
if is use your mentioned code,
this is my output
# A tibble: 1,048,575 x 7
# Groups: gvkey [288]
gvkey datadate cshoc prccd firmMarket TotalMarket weightage
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 7041 20000103 42000000 3.86 162120000 4.33e15 0.0000000375
2 7041 20000104 42000000 4.28 179760000 4.33e15 0.0000000415
3 7041 20000105 42000000 4 168000000 4.33e15 0.0000000388
4 7041 20000106 42000000 4.04 169680000 4.33e15 0.0000000392
5 7041 20000111 42000000 4 168000000 4.33e15 0.0000000388
6 7041 20000112 42000000 3.96 166320000 4.33e15 0.0000000384
7 7041 20000113 42000000 3.92 164640000 4.33e15 0.0000000380
8 7041 20000114 42000000 3.96 166320000 4.33e15 0.0000000384
9 7041 20000117 42000000 4.06 170520000. 4.33e15 0.0000000394
10 7041 20000118 42000000 4.14 173880000 4.33e15 0.0000000402
# ... with 1,048,565 more rows
i am not understanding why TotlaMarket values are coming in "4.33e15" for all. But as per your output, it should be "168924000".
You refer now to code I posted before your statements about total market being across gvkey by grouping datadate only, I provided different code after those statements. It is those codes which would show the same TotalMarket value for every matching datadate across varying gvkeys... (yet you complained about them also, in a way that eludes my understanding)
Can you please copy paste new code here.. a lot of confusion. Sorry for that and my output is not coming same as you old code too.
NewData<- data.frame(
gvkey = c(7041, 7041, 7041, 7041, 7041, 7041, 7041, 7041, 7041, 7041),
datadate = c(20000103,20000104,20000105,20000106,
20000111,20000112,20000113,20000114,20000117,20000118),
cshoc = c(4.2e+07,4.2e+07,4.2e+07,4.2e+07,4.2e+07,
4.2e+07,4.2e+07,4.2e+07,4.2e+07,4.2e+07),
prccd = c(3.86, 4.28, 4, 4.04, 4, 3.96, 3.92, 3.96, 4.06, 4.14)
)
mutate(NewData,
firmMarket=cshoc*prccd)%>%
group_by(datadate)%>%
mutate(TotalMarket=sum(firmMarket,na.rm=TRUE))%>%
mutate(weightage=firmMarket/TotalMarket)
# A tibble: 10 x 7
# Groups: datadate [10]
gvkey datadate cshoc prccd firmMarket TotalMarket weightage
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 7041 20000103 42000000 3.86 162120000 162120000 1
2 7041 20000104 42000000 4.28 179760000 179760000 1
3 7041 20000105 42000000 4 168000000 168000000 1
4 7041 20000106 42000000 4.04 169680000 169680000 1
5 7041 20000111 42000000 4 168000000 168000000 1
6 7041 20000112 42000000 3.96 166320000 166320000 1
7 7041 20000113 42000000 3.92 164640000 164640000 1
8 7041 20000114 42000000 3.96 166320000 166320000 1
9 7041 20000117 42000000 4.06 170520000. 170520000. 1
10 7041 20000118 42000000 4.14 173880000 173880000 1
Ok great, thanks i understand, but now when i use your code, i am getting some different output.
# Groups: gvkey, datadate [1,031,402]
gvkey datadate cshoc prccd firmMarket TotalMarket weightage PriceWeightage MarketIndex
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 7041 20000103 42000000 3.86 162120000 4.33e15 0.0000000375 0.000000145 17.7
2 7041 20000104 42000000 4.28 179760000 4.33e15 0.0000000415 0.000000178 17.7
3 7041 20000105 42000000 4 168000000 4.33e15 0.0000000388 0.000000155 17.7
4 7041 20000106 42000000 4.04 169680000 4.33e15 0.0000000392 0.000000158 17.7
5 7041 20000111 42000000 4 168000000 4.33e15 0.0000000388 0.000000155 17.7
6 7041 20000112 42000000 3.96 166320000 4.33e15 0.0000000384 0.000000152 17.7
7 7041 20000113 42000000 3.92 164640000 4.33e15 0.0000000380 0.000000149 17.7
8 7041 20000114 42000000 3.96 166320000 4.33e15 0.0000000384 0.000000152 17.7
9 7041 20000117 42000000 4.06 170520000. 4.33e15 0.0000000394 0.000000160 17.7
10 7041 20000118 42000000 4.14 173880000 4.33e15 0.0000000402 0.000000166 17.7
# ... with 1,048,565 more rows
How I can solve this issue. My TotalMarket value is coming different. I hope this will be my last question
my totalmarket is based on 10 rows, yours on 1million, therefore you should not expect it to be literally the same number...
I cant comment on whether you have done things correctly or not, because there are no other gvkeys shown that would indicate whether totalmarket matches from them across datadates....
However, it does seem like you have simply not implemented my suggestion, my code fragment, involves
# Groups: datadate [10]
yours
gvkey, datadate [1,031,402]
perhaps your wrote more code after my fragment to set different group statements, perhaps you didnt, I can't know. I can only see that it differs.
Thanks for your input.