Hi there,
I'm developing my first R package to learn more about the process. I'm using data published in annual reports from a local government agency. I'm going to split the data up into a few data frames to group related data together. I've found some great tutorials that cover the technical elements of creating packages. But my question is how do I format these dataframes?
I have two options:
A wider format where each row is a year, and each variable is a column.
| year | var1 | var2 |
|---|---|---|
| 2025 | 5 | 10 |
| 2024 | 4 | 9 |
| 2023 | 3 | 8 |
A longer format where variables are listed under 'category' and a single counts column:
| year | category | count |
|---|---|---|
| 2025 | var1 | 5 |
| 2025 | var2 | 10 |
| 2024 | var1 | 4 |
| 2024 | var2 | 9 |
What is best practice for including data in packages? When doing my own analyses I often work with ggplot2 which is designed to work with long format data. But I've looked at some built-in data packages and they tend to use wide format. I can't find any discussion or recommendations about this topic so I'm grateful for any advice.