I have the following (sample) dput() for a dataframe consisting of non-unique IDs, countries, and columns representing country names:
structure(list(id = c(1, 1, 2, 3, 4, 4), country = c("USA", "Japan",
"Germany", "Austria", "Japan", "Austria"), USA = c(0, 0, 0, 0,
0, 0), Germany = c(0, 0, 0, 0, 0, 0), Japan = c(0, 0, 0, 0, 0,
0), Austria = c(0, 0, 0, 0, 0, 0), Belgium = c(0, 0, 0, 0, 0,
0)), class = "data.frame", row.names = c(NA, -6L))
I would like to alter the above dataframe such that, for every id i
, each column with a country name "I.e., df$Germany
, df$Japan
" has a value == 1 if df$country
== 1. I.e., rows 1 and 2 should have df$Japan == 1
and df$USA == 1
; rows 5 and 6 should have df$Japan == 1
and df$Austria == 1
.
Are there any ways to do this without using for loops or pivot? (notice my current data has a "Belgium" column but no Belgium value in df$country
-- pivot will not work here as far as I understand). For loops take much too long--I have 30+ datasets of 100k+ observations each that I need to rapidly apply this too.
Someone on a different thread suggested: df[levels(factor(df$country))] = model.matrix(~country - 1, df)
but this gives an erroneous structure, the dput being given below:
structure(list(id = c(1, 1, 2, 3, 4, 4), country = c("USA", "Japan",
"Germany", "Austria", "Japan", "Austria"), USA = c(1, 0, 0, 0,
0, 0), Germany = c(0, 0, 1, 0, 0, 0), Japan = c(0, 1, 0, 0, 1,
0), Austria = c(0, 0, 0, 1, 0, 1), Belgium = c(0, 0, 0, 0, 0,
0)), row.names = c(NA, -6L), class = "data.frame")
I've also thought about using mutate()
with cur_column()
but I'm not sure as to how to even write that code.