Restructuring a dataframe

ricdob · July 4, 2023, 12:15pm

Hey Guys,

i´d like to rearrange a dataframe I use and simply can´t figure out how to do it.
The dataframe looks like this:

Now I want to create a new column which includes all the brand names from the column names. As you can see the first row includes the information of what kind of indicators are in these rows. So I want In_Net_Score to be the column name and so on. The dataset should be arranged by the brand names. So I want to create a dataset which has the Row Date first with the dates, after that the Brand e.g. Mars, and after that all the Numbers so In_Net_Score, In_Volume and to on. Second row would be Mars for the second date then with the same column. Till no data for mars is left and the dataset moves on with the next brand.
It should look like this:

Already thanks for your help

FJCC · July 4, 2023, 12:40pm

Please post some actual data, not an image. You can do that by posting the output of the dput() function. If your data frame is named DF, the output of

dput(DF[1:6,1:5])

will provide the first 6 rows and 5 columns of your data set.

FJCC · July 4, 2023, 1:02pm

I think this is at least close to what you want. Notice that the Score and Volume columns are characters. If you want those to be numbers, you can change them with as.numeric().
I included several unnecessary print steps to help illustrate the progression of the code.

library(tidyr)
library(dplyr)

DF <- structure(list(Date = structure(c(NA, 1514764800, 1514851200,
                                  1514937600, 1515024000, 1515110400), tzone = "UTC", class = c("POSIXct","POSIXt")), 
               Mars = c("In_Net_Score", "19.102253999999999", "26.879909999999999",
                        "25.833333", "18.901593999999999", "33"), 
               Mars = c("In_Volume",
                        "33.833333000000003", "32", "23.333333", "35.166666999999997",
                        "34.166666999999997"), 
               Bahlsen = c("In_Net_Score", "24.098123999999999",
                           "28.367433999999999", "38.580247", "30.646163000000001", "40.323956000000003"
               ), 
               Bahlsen = c("In_Volume", "31", "32.166666999999997", "26.166667",
                           "35.666666999999997", "32.166666999999997")), row.names = c(NA,
                                                                                       -6L), class = c("tbl_df", "tbl", "data.frame"))
colnames(DF) <- paste(colnames(DF), DF[1,], sep = ".")
DF <- DF[-1,]
DF
#> # A tibble: 5 × 5
#>   Date.NA             Mars.In_Net_Score  Mars.In_Volume     Bahlsen.In_Net_Score
#>   <dttm>              <chr>              <chr>              <chr>               
#> 1 2018-01-01 00:00:00 19.102253999999999 33.833333000000003 24.098123999999999  
#> 2 2018-01-02 00:00:00 26.879909999999999 32                 28.367433999999999  
#> 3 2018-01-03 00:00:00 25.833333          23.333333          38.580247           
#> 4 2018-01-04 00:00:00 18.901593999999999 35.166666999999997 30.646163000000001  
#> 5 2018-01-05 00:00:00 33                 34.166666999999997 40.323956000000003  
#> # ℹ 1 more variable: Bahlsen.In_Volume <chr>
DF_long <- pivot_longer(DF, cols = -Date.NA)
head(DF_long)
#> # A tibble: 6 × 3
#>   Date.NA             name                 value             
#>   <dttm>              <chr>                <chr>             
#> 1 2018-01-01 00:00:00 Mars.In_Net_Score    19.102253999999999
#> 2 2018-01-01 00:00:00 Mars.In_Volume       33.833333000000003
#> 3 2018-01-01 00:00:00 Bahlsen.In_Net_Score 24.098123999999999
#> 4 2018-01-01 00:00:00 Bahlsen.In_Volume    31                
#> 5 2018-01-02 00:00:00 Mars.In_Net_Score    26.879909999999999
#> 6 2018-01-02 00:00:00 Mars.In_Volume       32
DF_long <- DF_long |> separate(col = "name", into = c("Brand", "Var"), sep = "\\.")
head(DF_long)
#> # A tibble: 6 × 4
#>   Date.NA             Brand   Var          value             
#>   <dttm>              <chr>   <chr>        <chr>             
#> 1 2018-01-01 00:00:00 Mars    In_Net_Score 19.102253999999999
#> 2 2018-01-01 00:00:00 Mars    In_Volume    33.833333000000003
#> 3 2018-01-01 00:00:00 Bahlsen In_Net_Score 24.098123999999999
#> 4 2018-01-01 00:00:00 Bahlsen In_Volume    31                
#> 5 2018-01-02 00:00:00 Mars    In_Net_Score 26.879909999999999
#> 6 2018-01-02 00:00:00 Mars    In_Volume    32
DF_Final <- DF_long |> pivot_wider(names_from = "Var", values_from = "value") |> 
  arrange(Brand)
DF_Final
#> # A tibble: 10 × 4
#>    Date.NA             Brand   In_Net_Score       In_Volume         
#>    <dttm>              <chr>   <chr>              <chr>             
#>  1 2018-01-01 00:00:00 Bahlsen 24.098123999999999 31                
#>  2 2018-01-02 00:00:00 Bahlsen 28.367433999999999 32.166666999999997
#>  3 2018-01-03 00:00:00 Bahlsen 38.580247          26.166667         
#>  4 2018-01-04 00:00:00 Bahlsen 30.646163000000001 35.666666999999997
#>  5 2018-01-05 00:00:00 Bahlsen 40.323956000000003 32.166666999999997
#>  6 2018-01-01 00:00:00 Mars    19.102253999999999 33.833333000000003
#>  7 2018-01-02 00:00:00 Mars    26.879909999999999 32                
#>  8 2018-01-03 00:00:00 Mars    25.833333          23.333333         
#>  9 2018-01-04 00:00:00 Mars    18.901593999999999 35.166666999999997
#> 10 2018-01-05 00:00:00 Mars    33                 34.166666999999997

^{Created on 2023-07-04 with reprex v2.0.2}

ricdob · July 4, 2023, 1:10pm

Thank you very much, this works perfectly!

system · July 11, 2023, 1:24pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.