Hello Community,
I was merging two datasets, one which is on firm-level (COMPUSTAT) and one which shows issued bonds as a sequence after they occurred (FISD Mergent).
After matching, I have a list of firm-quarter observations to which the respective bonds are matched. For dates/ quarters where the company had two or three bonds outstanding, I have 2-3 rows respectively.
My goal is to group/aggregate the dataset, that for each datadate a company has only one line, and the bond-information is summed up, whereas all other information is taken over as-is.
My data, in short, looks like this:
Company, Bond_key, Datadate, Assets, Bond_Amount
APPLE, 12345AB, 20080231, 500, 50
APPLE, 12345AB, 20090531, 550, 50
APPLE, 6789CD, 20090531, 550, 100
APPLE, 6789CD, 20101231, 600, 100
My goal is to aggregate the following way:
Company, Bond_key, Datadate, Assets, Bond_Amount
APPLE, 12345AB, 20080231, 500, 50
APPLE, ----------- , 20090531, 550, 150
APPLE, 6789CD, 20101231, 600, 100
I have already worked with group_by from the {dplyr} package, and I basically get what I want (a specific ratio which combines firm-level and bond-data), however, I only have the variables specified in my code in the final output dataset, although I want to take over all variables of my dataset (450 firm-level, 220 bond-level variables). Is there a way to use the group_by and apply it to many variables in an efficient why?
file03 <- file02 %>%
group_by(LPERMCO, datadate) %>%
summarise(issues = sum(OFFERING_AMT)/1000, TDEBT = mean(TDEBT)) %>%
mutate(BONDRATIO = issues / TDEBT)
*Note that the two datasets have different unit measures, that is why I adjust with / 1000.
Secondly, I used the aggregate code from {dplyr} and it gives me the right result if I look at one company only. Is there a way to loop this code over all companies in the dataset?
aggregate(OFFERING_AMT~datadate,data=file02_filter,FUN=sum)
I appreciate your help and also other approaches to solve the problem.
Thank you very much!