First off, just as an FYI, the line filter(!is.na(col1))
does not do anything in this code, the tibble stays 195x47.
Second, the reason why the pivot_longer()
does not work, is because columns in a dataframe need to be the same data type. With this code you are putting the original column names in a new column - that's fine. But then you are also putting the contents of those original columns all in one column: but some are numeric (e.g. col2
, col3
, etc) and some are characters (e.g. colA1
, colA2
, etc). The function has an optional argument values_transform
for which the help file says: " If not specified, the type of the columns generated from names_to
will be character, and the type of the variables generated from values_to
will be the common type of the input columns used to generate them.". And in your case, there is no common data type, there are different data types and pivot_longer()
does not want to coerce data types from one to the other.
If you run the code without the part that transforms the numbers from character to numeric, the pivot_longer()
works. However, you then get a warning for the pivot_wider()
: Warning message: Values from
value are not uniquely identified; output will contain list-cols.
Which I am sure you don't want (every column is now a list, and you end up with 1756 columns).
Okay - so after trying to figure out what you want exactly, I think the function is maybe also not creating what you think it is creating? If I run your code as-is, up to the map_df()
line only, it produces:
# A tibble: 195 × 47
Municipio col1 col2 col3 colA1 colA2 colA3 colB1 colB2 colB3 colC1 colC2 colC3 colD1 colD2 colD3
<chr> <chr> <dbl> <dbl> <chr> <dbl> <dbl> <chr> <dbl> <dbl> <chr> <dbl> <dbl> <chr> <dbl> <dbl>
1 Huelmo PSOE 248 5030 S. A. … 0 0 F 0 0 EXIS… 0 0 FE d… 0 0
2 Busdongo PP 44 3928 FE de … 0 0 SAE 0 0 J. U… 0 0 PH 0 0
3 Duero PSOE 89 8240 S. A. … 0 0 F 0 0 EXIS… 0 0 SAE 0 0
4 Coin PSOE 1637 3968 P-LIB 0 0 SUMAR 0 0 CREE… 0 0 PMR 0 0
5 Anduva PSOE 148 5016 EXISTE 0 0 SAE 0 0 J. U… 0 0 PH 0 0
6 Huelmo PP 72 1460 NA NA NA NA NA NA NA NA NA NA NA NA
7 Busdongo PSOE 33 2946 NA NA NA NA NA NA NA NA NA NA NA NA
8 Duero VOX 2 185 NA NA NA NA NA NA NA NA NA NA NA NA
9 Coin PP 924 2240 NA NA NA NA NA NA NA NA NA NA NA NA
10 Anduva PP 77 2610 NA NA NA NA NA NA NA NA NA NA NA NA
# ℹ 185 more rows
# ℹ 31 more variables: colE1 <chr>, colE2 <dbl>, colE3 <dbl>, colF1 <chr>, colF2 <dbl>, colF3 <dbl>,
# colG1 <chr>, colG2 <dbl>, colG3 <dbl>, colH1 <chr>, colH2 <dbl>, colH3 <dbl>, colI1 <chr>,
# colI2 <dbl>, colI3 <dbl>, colJ1 <chr>, colJ2 <dbl>, colJ3 <dbl>, colK1 <chr>, colK2 <dbl>,
# colK3 <dbl>, colL1 <chr>, colL2 <dbl>, colL3 <dbl>, colM1 <chr>, colM2 <dbl>, colM3 <dbl>,
# Acol1 <chr>, Acol2 <dbl>, Acol3 <dbl>, colunicipio <chr>
# ℹ Use `print(n = ...)` to see more rows
So the characters in A1
, B1
, C1
become col1
, and all the values in A2
, B2
, C2
, etc go in col2
, and A3
, B3
, C3
etc in col3
? But then you also still have colA1
, colA2
, colA3
, etc, all populated with either characters, 0
s or NA
s...? From the example you are showing, it looks like you don't need/want all this? Why not get rid of those? This code seems to work (see my comments with the piped code part):
# Use same dataset - not shown here, large
col_voto <- c("Cod","A2","A3","B2","B3","C2","C3","D2","D3","E2","E3","F2","F3","G2","G3","H2","H3","I2","I3","J2","J3","K2","K3","L2","L3","M2","M3","N2","N3","O2","O3", "P2","P3","Q2","Q3","R2","R3","S2","S3","T2","T3","U2","U3","V2","V3","W2","W3","X2","X3","Y2","Y3","Z2","Z3","AA2","AA3","AB2","AB3","AC2","AC3","AD2","AD3",
"AE2","AE3","AF2","AF3","AG2","AG3","AH2","AH3","AI2","AI3","AJ2","AJ3","AK2","AK3","AL2","AL3","AM2","AM3")
df <- df %>%
mutate(across(all_of(col_voto), as.numeric))
stack_data = function(i) {
d = select(df, Municipio, starts_with(i), -Cod)
names(d) = str_replace(names(d), i, 'col')
d
}
df3 <- map_df(c('A', 'B','C', 'D', 'E', 'F', 'G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z','AA','AB','AC','AD','AE','AF','AG',
'AH','AI','AJ','AK','AL','AM'), stack_data) |>
select(Municipio, col1, col2, col3) |> # only select the columns you need
filter(!is.na(col1)) |> # FYI does nothing, tibble stays 195x47
filter(!is.na(Municipio)) |> # Is this maybe what you want? Somehow there are 5 rows with NA for Municipio
pivot_longer(c(-'Municipio', -'col1')) |> # This now works since col2 and col3 are both numeric
pivot_wider(names_from = c(col1, name), values_from = value) |>
# You now also don't need the select code below anymore
# select(Municipio, contains('PSOE'), contains('PP'), contains('IZQESP'), contains('CEUS'), contains('EL PI'), contains('CCa'), contains('GBAI'),
# contains('EAJ-PNV'), contains('FO'), contains('AHORA REPÚBLICAS (ERC-EH BILDU-BNG-ARA MÉS)'), contains('JUNTS-UE'), contains('ANDALUCISTAS'),
# contains('IE'), contains('VOX'), contains('ESCAÑOS EN BLANCO'), contains('CRT'), contains('PODEMOS'), contains('PFAC'), contains('PCTE'), contains('PCTC'),
# contains('PCPE/PCPC'), contains('SALAMANCA-ZAMORA-LEÓN PREPAL'), contains('P-LIB'), contains('SUMAR'),contains('PACMA'), contains('CREE EN EUROPA'),
# contains('PMR'), contains('S.A.L.F'), contains('F'), contains('EXISTE'), contains('Cs'), contains('FE de las JONS'), contains('SAE'), contains('J.U.E.X.'),
# contains('PH'), contains('VOLT'), contains('RECORTES CERO'), contains('PAZ SOBERANÍA REPÚBLICA'), contains('FIEL'),contains('IZQP - UNIDOS - DEf'),
# contains('EXTREMEÑOS, Y SUBSIDIARIAMENTE EX PREX CREX'), contains('glg'), contains('PIRATES.CAT / REBELDES'), contains('NEX')) |>
mutate_all(~replace_na(., 0))
The final dataframe (I called it df3
to compare with the original) is not exactly the same as the example, but it still contains all the column names that you specified in the select()
part, and a few extra (your code did not have PODEMOS
in it, but your example did).
Not knowing anything whatsoever about your data, this seems like what you need? If not, please give some more detail.
Edited to that this is the result:
# A tibble: 5 × 79
Municipio PSOE_col2 PSOE_col3 PP_col2 PP_col3 VOX_col2 VOX_col3 PODEMOS_col2 PODEMOS_col3 Cs_col2
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Huelmo 248 5030 72 1460 41 831 66 1338 21
2 Busdongo 33 2946 44 3928 4 357 20 1785 3
3 Duero 89 8240 2 185 2 185 0 0 2
4 Coin 1637 3968 924 2240 107 259 296 717 386
5 Anduva 148 5016 77 2610 15 508 15 508 18
# ℹ 69 more variables: Cs_col3 <dbl>, CEUS_col2 <dbl>, CEUS_col3 <dbl>, PH_col2 <dbl>, PH_col3 <dbl>,
# `AHORA REPÚBLICAS (ERC-EH BILDU-BNG-ARA MÉS)_col2` <dbl>,
# `AHORA REPÚBLICAS (ERC-EH BILDU-BNG-ARA MÉS)_col3` <dbl>, `FE de las JONS_col2` <dbl>,
# `FE de las JONS_col3` <dbl>, `RECORTES CERO_col2` <dbl>, `RECORTES CERO_col3` <dbl>,
# `PCPE/PCPC_col2` <dbl>, `PCPE/PCPC_col3` <dbl>, IZQESP_col2 <dbl>, IZQESP_col3 <dbl>,
# `JUNTS-UE_col2` <dbl>, `JUNTS-UE_col3` <dbl>, PCTE_col2 <dbl>, PCTE_col3 <dbl>, PACMA_col2 <dbl>,
# PACMA_col3 <dbl>, ANDALUCISTAS_col2 <dbl>, ANDALUCISTAS_col3 <dbl>, FO_col2 <dbl>, FO_col3 <dbl>, …