library(dplyr)
library(tidyr)
DF <- read.csv("~/R/Play/Dummy.csv")
DF
#> id math X history Y geo Z
#> 1 1 1/12/2023 A 9/10/2023 A 14/9/2023 A
#> 2 1 15/9/2023 B 4/7/2023 D 19/2/2023 D
#> 3 2 18/12/2023 A 6/7/2023 C 10/4/2023 A
#> 4 2 30/8/2023 C 5/6/2023 B 24/12/2023 C
#> 5 3 5/6/2023 D 19/3/2023 A 27/5/2023 D
#> 6 3 11/11/2023 B 14/6/2023 A 27/9/2023 B
colnames(DF)[seq(3,7,2)] <- paste(colnames(DF)[seq(2,6,2)],
colnames(DF)[seq(3,7,2)], sep = "_")
DF
#> id math math_X history history_Y geo geo_Z
#> 1 1 1/12/2023 A 9/10/2023 A 14/9/2023 A
#> 2 1 15/9/2023 B 4/7/2023 D 19/2/2023 D
#> 3 2 18/12/2023 A 6/7/2023 C 10/4/2023 A
#> 4 2 30/8/2023 C 5/6/2023 B 24/12/2023 C
#> 5 3 5/6/2023 D 19/3/2023 A 27/5/2023 D
#> 6 3 11/11/2023 B 14/6/2023 A 27/9/2023 B
DF |> group_by(id) |> mutate(subrow = row_number()) |>
pivot_longer(cols = -c("id", "subrow"), names_to = "Var") |>
pivot_wider(names_from = c("Var", "subrow"), names_sep = "_", values_from = "value")
#> # A tibble: 3 × 13
#> # Groups: id [3]
#> id math_1 math_X_1 history_1 history_Y_1 geo_1 geo_Z_1 math_2 math_X_2
#> <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 1/12/2023 A 9/10/2023 A 14/9/… A 15/9/… B
#> 2 2 18/12/2023 A 6/7/2023 C 10/4/… A 30/8/… C
#> 3 3 5/6/2023 D 19/3/2023 A 27/5/… D 11/11… B
#> # ℹ 4 more variables: history_2 <chr>, history_Y_2 <chr>, geo_2 <chr>,
#> # geo_Z_2 <chr>
I'm curious, if you don't mind sharing: What is your use case? Why do you want to create a single row per id value? In an application to calculating grades (which this looks like), I would think that a longer version would be more useful than a wider version.
Hi, thanks for the help. The problem with the solution you gave me is that in my df the variables "history", "math" and "geo" are dates and in your df there aren't. So I can't use your code because I can't combine dates with character.
Hello, no problem with the question. Actually, I have to do this same but with a much longer base and with values of another type. This example I made was just as a reprex. So it may look that it's not worth doing all this for this example, but I really need it for another case.
I didn't mean to suggest it wasn't worth it, just that sometimes one's actual use case may help folks find a better solution than they might come up with based on an understanding informed by a limited example. It looks like you were satisfied with @ vedoa's solution, but if you're up for it, I would be curious to know more about your use case. Otherwise, best wishes on further progress with your work.
Hi, I have a very large database with many repeated IDs and I find more optimal to have a database with one observation per ID for cleaning, processing, validation and cross analysis of variables.
I think it's easier that way.