How to Pivot a Dataframe in R to Count Unique CNPJs and Projects by Year and Analysis Type?

czargab18 · June 16, 2024, 10:47pm

Hello everyone,

I am working on a project in R and facing difficulties with pivoting a dataframe. My dataframe has the following columns:

CNPJ: where some rows have repeated values.
Projects: which contains the names of the projects.
Analysis Suggestion: which can receive two values {"Merit of the Project"; "Merit of the Project and Expenditure"}.
Year: which receives the years {2019, 2020, 2021}.

I would like to create a pivot table with the following specifications:

Columns for the years (2019, 2020, 2021) showing the number of unique CNPJs per year.
Columns for "Merit of the Project" and "Merit of the Project and Expenditure" showing the number of projects for each analysis suggestion type.

Here is an example of the initial dataframe:

df <- data.frame(
  CNPJ = c("12.345.678/0001-91", "12.345.678/0001-91", "98.765.432/0001-99", "12.345.678/0001-91", "98.765.432/0001-99"),
  Projects = c("Project A", "Project B", "Project C", "Project D", "Project E"),
  Analysis_Suggestion = c("Merit of the Project", "Merit of the Project and Expenditure", "Merit of the Project", "Merit of the Project and Expenditure", "Merit of the Project"),
  Year = c(2019, 2020, 2021, 2019, 2020)
)

The expected result would be something like:

  2019 2020 2021 Merit of the Project Merit of the Project and Expenditure
1    1    2    1                   3                                    2

However, I am having trouble ensuring that the years and analysis types are correctly aligned and accounted for.

Can someone guide me on how to correct this code or suggest a better approach to achieve the desired result?

Thank you in advance for your help!

FJCC · June 16, 2024, 11:45pm

I'm sure there is a more elegant solution, but I would do this in two steps.

df <- data.frame(
  CNPJ = c("12.345.678/0001-91", "12.345.678/0001-91", "98.765.432/0001-99", "12.345.678/0001-91", "98.765.432/0001-99"),
  Projects = c("Project A", "Project B", "Project C", "Project D", "Project E"),
  Analysis_Suggestion = c("Merit of the Project", "Merit of the Project and Expenditure", "Merit of the Project", "Merit of the Project and Expenditure", "Merit of the Project"),
  Year = c(2019, 2020, 2021, 2019, 2020)
)
library(tidyverse)
#> Warning: package 'ggplot2' was built under R version 4.3.3
DF1 <- df |> select(Year, CNPJ) |> distinct() |> group_by(Year) |> 
  summarize(N=n()) |> pivot_wider(names_from = Year, values_from = N)
DF2 <- df |> group_by(Analysis_Suggestion) |> summarize(N = n()) |> 
  pivot_wider(names_from = Analysis_Suggestion, values_from = N)

cbind(DF1, DF2)
#>   2019 2020 2021 Merit of the Project Merit of the Project and Expenditure
#> 1    1    2    1                    3                                    2

^{Created on 2024-06-16 with reprex v2.0.2}

system · September 14, 2024, 11:45pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.