How can I condense all the information in a single row? I know I have to use pivot_longer but I don't know how to solve it.

Hi, I'm stuck with this problem and I don't know how to solve it.
I have this base

id math X history Y geo Z
1 1/12/2023 A 9/10/2023 A 14/9/2023 A
1 15/9/2023 B 4/7/2023 D 19/2/2023 D
2 18/12/2023 A 6/7/2023 C 10/4/2023 A
2 30/8/2023 C 5/6/2023 B 24/12/2023 C
3 5/6/2023 D 19/3/2023 A 27/5/2023 D
3 11/11/2023 B 14/6/2023 A 27/9/2023 B

and I need each id to have only one observation like this

id math_1 math_2 math_x_1 math_x_2 history_1 history_2 history_y_1 history_y_2 geo_1 geo_2 geo_z_1 geo_z_2
1 1/12/2023 15/9/2023 A B 9/10/2023 4/7/2023 A D 14/9/2023 19/2/2023 A D
2 18/12/2023 30/8/2023 A C 6/7/2023 5/6/2023 C B 10/4/2023 24/12/2023 A C
3 5/6/2023 11/11/2023 D B 19/3/2023 14/6/2023 A A 27/5/2023 27/9/2023 D B

How can I condense all the information in a single row? I know I have to use pivot_longer but I don't know how to solve it.

I would do it like this:

library(dplyr)
library(tidyr)

DF <- read.csv("~/R/Play/Dummy.csv")
DF
#>   id       math X   history Y        geo Z
#> 1  1  1/12/2023 A 9/10/2023 A  14/9/2023 A
#> 2  1  15/9/2023 B  4/7/2023 D  19/2/2023 D
#> 3  2 18/12/2023 A  6/7/2023 C  10/4/2023 A
#> 4  2  30/8/2023 C  5/6/2023 B 24/12/2023 C
#> 5  3   5/6/2023 D 19/3/2023 A  27/5/2023 D
#> 6  3 11/11/2023 B 14/6/2023 A  27/9/2023 B
colnames(DF)[seq(3,7,2)] <- paste(colnames(DF)[seq(2,6,2)], 
                                  colnames(DF)[seq(3,7,2)], sep = "_")
DF
#>   id       math math_X   history history_Y        geo geo_Z
#> 1  1  1/12/2023      A 9/10/2023         A  14/9/2023     A
#> 2  1  15/9/2023      B  4/7/2023         D  19/2/2023     D
#> 3  2 18/12/2023      A  6/7/2023         C  10/4/2023     A
#> 4  2  30/8/2023      C  5/6/2023         B 24/12/2023     C
#> 5  3   5/6/2023      D 19/3/2023         A  27/5/2023     D
#> 6  3 11/11/2023      B 14/6/2023         A  27/9/2023     B

DF |> group_by(id) |> mutate(subrow = row_number()) |> 
  pivot_longer(cols = -c("id", "subrow"), names_to = "Var") |> 
  pivot_wider(names_from = c("Var", "subrow"), names_sep = "_", values_from = "value")
#> # A tibble: 3 × 13
#> # Groups:   id [3]
#>      id math_1     math_X_1 history_1 history_Y_1 geo_1  geo_Z_1 math_2 math_X_2
#>   <int> <chr>      <chr>    <chr>     <chr>       <chr>  <chr>   <chr>  <chr>   
#> 1     1 1/12/2023  A        9/10/2023 A           14/9/… A       15/9/… B       
#> 2     2 18/12/2023 A        6/7/2023  C           10/4/… A       30/8/… C       
#> 3     3 5/6/2023   D        19/3/2023 A           27/5/… D       11/11… B       
#> # ℹ 4 more variables: history_2 <chr>, history_Y_2 <chr>, geo_2 <chr>,
#> #   geo_Z_2 <chr>

Created on 2024-01-13 with reprex v2.0.2

Hi @juandmaz ,

my approach

d <- data.frame(id = c(1,1,2,2,3,3),
                math = c("01.12.2023",	"15.09.2023",	"18.12.2023",	"30.08.2023",	"05.06.2023",	"11.11.2023"),
                X =	c("A",	"B",	"A", "C",	"D",	"B"),
                history = c("09.10.2023","04.07.2023",	"06.07.2023",	"05.06.2023",	"19.03.2023",	"14.06.2023"),
                Y = c("A", "D",	"C", "B", "A", "A"),
                geo = c("14.09.2023",	"19.02.2023",	"10.04.2023",	"24.12.2023",	"27.05.2023",	"27.09.2023"),
                Z = c("A","D","A","C","D","B"))

library(tidyr)
library(dplyr)

dFinal <- d |> 
  group_by(id) |> 
  mutate(forWider = 1:n()) |>
  pivot_wider(names_from = forWider, values_from = !(c(id, forWider)))
1 Like

Hi Juan,

I'm curious, if you don't mind sharing: What is your use case? Why do you want to create a single row per id value? In an application to calculating grades (which this looks like), I would think that a longer version would be more useful than a wider version.

Hi, thanks for the help. The problem with the solution you gave me is that in my df the variables "history", "math" and "geo" are dates and in your df there aren't. So I can't use your code because I can't combine dates with character.

Hello, no problem with the question. Actually, I have to do this same but with a much longer base and with values of another type. This example I made was just as a reprex. So it may look that it's not worth doing all this for this example, but I really need it for another case.

I didn't mean to suggest it wasn't worth it, just that sometimes one's actual use case may help folks find a better solution than they might come up with based on an understanding informed by a limited example. It looks like you were satisfied with @ vedoa's solution, but if you're up for it, I would be curious to know more about your use case. Otherwise, best wishes on further progress with your work.

Hi, I have a very large database with many repeated IDs and I find more optimal to have a database with one observation per ID for cleaning, processing, validation and cross analysis of variables.
I think it's easier that way.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.