Data Frame dividing column values by the respective row value

Dear All,

I have a data frame with one date column and 8 columns with stock prices. I am trying to divide all column values by their respective value of a particular row (i.e. by a price at a specific date) to normalize the data. Unfortunately, I have not figured out how to do this yet (the best solution would be a solution where the specific date of the row can be selected, such as 2020-03-02, to quickly change the base values to normalize the data if needed).

Thanks!

1 Like

Hi,

It's not entirely clear what you like to do, so maybe if my suggestion below is not what you are looking for, create a reprex following the guide below. A reprex consists of the minimal code and data needed to recreate the issue/question you're having. You can find instructions how to build and share one here:

You can use the group_by() from dplyr to perform actions on certain rows like this

library(dplyr)

#Generate some random data
set.seed(10)
myData = data.frame(
  date = as.Date(sample(c("2020-11-19", "2020-11-18"), 10, replace = T)),
  val1 = 1:10, val2 = runif(10))

myData
#>          date val1       val2
#> 1  2020-11-19    1 0.65165567
#> 2  2020-11-19    2 0.56773775
#> 3  2020-11-18    3 0.11350898
#> 4  2020-11-18    4 0.59592531
#> 5  2020-11-18    5 0.35804998
#> 6  2020-11-19    6 0.42880942
#> 7  2020-11-18    7 0.05190332
#> 8  2020-11-18    8 0.26417767
#> 9  2020-11-19    9 0.39879073
#> 10 2020-11-19   10 0.83613414

#Apply functions by group (in this case date)
myData %>% group_by(date) %>%
  mutate(val1 = scale(val1)[,1],
         val2 = (val2 - min(val2)) / (max(val2) - min(val2)))
#> # A tibble: 10 x 3
#> # Groups:   date [2]
#>    date          val1   val2
#>    <date>       <dbl>  <dbl>
#>  1 2020-11-19 -1.14   0.578 
#>  2 2020-11-19 -0.892  0.386 
#>  3 2020-11-18 -1.16   0.113 
#>  4 2020-11-18 -0.675  1     
#>  5 2020-11-18 -0.193  0.563 
#>  6 2020-11-19  0.0991 0.0686
#>  7 2020-11-18  0.772  0     
#>  8 2020-11-18  1.25   0.390 
#>  9 2020-11-19  0.842  0     
#> 10 2020-11-19  1.09   1

Created on 2020-11-19 by the reprex package (v0.3.0)

Hope this helps,
PJ

Hi @pieterjanvc

Perhaps I did not express myself clearly enough. Let's have a look at the following example data frame, which is similar to the original:

data.frame(
      StockA = c(15, 20, 26, 10, 5),
      StockB = c(27, 16, 15, 10, 23),
                            date = as.factor(c("2020-03-01",
                                               "2020-03-02","2020-03-03",
                                               "2020-03-04","2020-03-04"))
                    )

What I try to do is to divide all column values of the columns StockA and StockB by the values in a certain row. For example, I want all values in the columns StockA and StockB to be divided by the values in row 2020-03-02 (i.e. 20 and 16 respectively). Therefore the values in this row are normalized to 1 and the other values are adjusted accordingly. But I did not find any code for this, because I do not want to select the corresponding lines for the code by [x,] but by explicitly mentioning the date of the row. In addition, the original data frame is much larger, so it is not possible to do it manually by just dividing each column by the row value.

Hopefully it is clearer now and thanks in advance.

Is this what you mean?

library(dplyr)

sample_df <- data.frame(
    StockA = c(15, 20, 26, 10, 5),
    StockB = c(27, 16, 15, 10, 23),
    date = as.Date(c("2020-03-01",
                       "2020-03-02","2020-03-03",
                       "2020-03-04","2020-03-04"))
)

sample_df %>% 
    mutate(across(.cols = starts_with("stock"),
                  .fns = ~ . / sample_df %>%
                      filter(date == '2020-03-02') %>%
                      pull(cur_column())
                  )
           )
#>   StockA StockB       date
#> 1   0.75 1.6875 2020-03-01
#> 2   1.00 1.0000 2020-03-02
#> 3   1.30 0.9375 2020-03-03
#> 4   0.50 0.6250 2020-03-04
#> 5   0.25 1.4375 2020-03-04

Created on 2020-11-20 by the reprex package (v0.3.0.9001)

Hi @andresrcs

Yes, that's exactly what I'm looking for, thank you. There is just one small problem left:
Is it possible to multiply all values by 100, i.e. that the normalized values are 100 instead of 1 and all other values adjust accordingly?

I very much apologise for my question but I am still a RStudio beginner.

You simply have to multiply

sample_df %>% 
    mutate(across(.cols = starts_with("stock"),
                  .fns = ~ . / sample_df %>%
                      filter(date == '2020-03-02') %>%
                      pull(cur_column()) * 100)
    )

If your question's been answered (even by you!), would you mind choosing a solution? It helps other people see which questions still need help, or find solutions if they have similar problems. Here’s how to do it:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.