Reordering and organizing dataframe

jynusmac · May 20, 2023, 12:11pm

This dataframe:

df <- data.frame(
  stringsAsFactors = FALSE,
               Cod = c(7L, 9L, 11L, 14L, 16L),
               Mun = c("Argan", "Balb", "Barj", "Bemb", "Benz"),
                A1 = c("PS", "PP", "PP", "XBEMB", "CBR"),
                A2 = c(342L, 180L, 15L, 2292L, 231L),
                A3 = c(6010L, 8333L, 0L, 4637L, 5422L),
                A4 = c(5L, 6L, 2L, 7L, 4L),
                B1 = c("CBR", "PS", "PS", "BC", "VO"),
                B2 = c(135L, 27L, 10L, 1107L, 184L),
                B3 = c(2372L, 1250L, 0L, 2239L, 4319L),
                B4 = c(1L, 1L, 3L, 3L, 3L),
                C1 = c("PP", NA, NA, "PP", "PP"),
                C2 = c(87L, 0L, 0L, 771L, 11L),
                C3 = c(1528L, 0L, 0L, 1560L, 258L),
                C4 = c(1L, 0L, 0L, 2L, 0L),
                D1 = c(NA, NA, NA, "UP", "PS"),
                D2 = c(0L, 0L, 0L, 336L, 0L),
                D3 = c(0L, 0L, 0L, 679L, 0L),
                D4 = c(0L, 0L, 0L, 1L, 0L),
                E1 = c(NA, NA, NA, "PS", NA),
                E2 = c(0L, 0L, 0L, 235L, 0L),
                E3 = c(0L, 0L, 0L, 475L, 0L),
                E4 = c(0L, 0L, 0L, 0L, 0L),
                F1 = c(NA, NA, NA, "VO", NA),
                F2 = c(0L, 0L, 0L, 113L, 0L),
                F3 = c(0L, 0L, 0L, 228L, 0L),
                F4 = c(0L, 0L, 0L, 0L, 0L),
                G1 = c(NA, NA, NA, "CBR", NA),
                G2 = c(0L, 0L, 0L, 59L, 0L),
                G3 = c(0L, 0L, 0L, 119L, 0L),
                G4 = c(0L, 0L, 0L, 0L, 0L)
)

I want to achieve this:

It's possible? (blanks can be replaced by zeros)

scottyd22 · May 20, 2023, 11:45pm

A more elegant approach probably exists, but here is one way to reshape the data into the desired output (adjust column names as necessary).

library(tidyverse)

# function to isolate each "section" of the data (ie. A1-4, B1-4, etc.)
stack_data = function(i) {
  d = select(df, Mun, contains(i), -Cod)
  names(d) = str_replace(names(d), i, 'col')
  d
}

# map through "sections" A-G, stack, and reshape
map_df(c('A', 'B','C', 'D', 'E', 'F', 'G'), stack_data) |>
  filter(!is.na(col1)) |>
  pivot_longer(c(-'Mun', -'col1')) |>
  pivot_wider(names_from = c(col1, name), values_from = value) |>
  select(Mun, contains('PS'), contains('PP'), contains('CBR'), contains('VO'), 
         contains('XBEMB'), contains('BC'), contains('UP')) |>
  mutate_all(~replace_na(., 0))

Created on 2023-05-20 with reprex v2.0.2

system · May 27, 2023, 11:45pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.