Aggregate a dataset based on company and datadate; keep ALL variables of a merged file

bernahgrl · July 2, 2020, 7:30am

Hello Community,

I was merging two datasets, one which is on firm-level (COMPUSTAT) and one which shows issued bonds as a sequence after they occurred (FISD Mergent).
After matching, I have a list of firm-quarter observations to which the respective bonds are matched. For dates/ quarters where the company had two or three bonds outstanding, I have 2-3 rows respectively.

My goal is to group/aggregate the dataset, that for each datadate a company has only one line, and the bond-information is summed up, whereas all other information is taken over as-is.

My data, in short, looks like this:
Company, Bond_key, Datadate, Assets, Bond_Amount
APPLE, 12345AB, 20080231, 500, 50
APPLE, 12345AB, 20090531, 550, 50
APPLE, 6789CD, 20090531, 550, 100
APPLE, 6789CD, 20101231, 600, 100

My goal is to aggregate the following way:
Company, Bond_key, Datadate, Assets, Bond_Amount
APPLE, 12345AB, 20080231, 500, 50
APPLE, ----------- , 20090531, 550, 150
APPLE, 6789CD, 20101231, 600, 100

I have already worked with group_by from the {dplyr} package, and I basically get what I want (a specific ratio which combines firm-level and bond-data), however, I only have the variables specified in my code in the final output dataset, although I want to take over all variables of my dataset (450 firm-level, 220 bond-level variables). Is there a way to use the group_by and apply it to many variables in an efficient why?

file03 <- file02 %>%
  group_by(LPERMCO, datadate) %>%
  summarise(issues = sum(OFFERING_AMT)/1000, TDEBT = mean(TDEBT)) %>%
  mutate(BONDRATIO = issues / TDEBT)

*Note that the two datasets have different unit measures, that is why I adjust with / 1000.

Secondly, I used the aggregate code from {dplyr} and it gives me the right result if I look at one company only. Is there a way to loop this code over all companies in the dataset?

  aggregate(OFFERING_AMT~datadate,data=file02_filter,FUN=sum)

I appreciate your help and also other approaches to solve the problem.

Thank you very much!

siddharthprabhu · July 2, 2020, 8:39am

Hi @bernahgrl, welcome to RStudio Community.

It would be much easier for people to help you if you post a reproducible example (or reprex). Please read this post to learn how to create a minimal reprex.

In the absence of a reprex, I can only provide general advice. summarise() only keeps the grouping variables and summary measures. If you want to retain all variables, consider using mutate() instead to add those measures into a new variable and then work with that.

aggregate() isn't a dplyr function. If you want to perform your analysis per company, you might want to add company as a grouping variable.

system · July 23, 2020, 8:47am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.