Create frames in loop over string variable values

gsford · February 24, 2024, 2:53pm

I'm an R newbie trying to figure out the basics.

My data (frame df) has 24 model names and daily price data. I aim to create data frames for each model and to winsorize each model's prices.

I tried:

names <- c("Model1", "Model2", "Model3")
for (i in names) {
print(i)
}

which works.

But when I try to create frames:
for (i in names) {
i <- df %>% filter(name == i)
}
It does not work. I get a df called i.

I can create the frames independently and it works, but a loop would be cleaner.

So, in the loop, how do I get R to treat the index i as something other than i?

FJCC · February 24, 2024, 4:50pm

Here are two ways to make smaller data frames based on the values in one column. The first returns a list of data frames and the second makes independent data frame named after the values in the given column.

DF <- data.frame(name = c("Model1", "Model2", "Model3","Model1", "Model2", "Model3"),
                 Value = 1:6)
DF
#>     name Value
#> 1 Model1     1
#> 2 Model2     2
#> 3 Model3     3
#> 4 Model1     4
#> 5 Model2     5
#> 6 Model3     6

#Make a list of data frames.
ListOfDF <- split(DF, DF$name)
ListOfDF
#> $Model1
#>     name Value
#> 1 Model1     1
#> 4 Model1     4
#> 
#> $Model2
#>     name Value
#> 2 Model2     2
#> 5 Model2     5
#> 
#> $Model3
#>     name Value
#> 3 Model3     3
#> 6 Model3     6

#Use the assign() function to make independent data frames.
for(Nm in unique(DF$name)) {
  assign(Nm, DF[DF$name == Nm, ])
}
ls() #show all the objects that now are in the environment
#> [1] "DF"       "ListOfDF" "Model1"   "Model2"   "Model3"   "Nm"

^{Created on 2024-02-24 with reprex v2.0.2}

gsford · February 24, 2024, 7:03pm

Thanks much. It worked as intended.

gsford · February 24, 2024, 7:07pm

I want to winsorize within a loop through these frames, and tried

for(Nm in unique(df$name)) {
pricew <- Winsorize(Nm$price, minval = NULL , maxval = NULL ,
probs = c(0.01,0.99), na.rm = TRUE, type = 1)
Nm$price <- pricew
}

But it doesn't like it. Is there a solution?

FJCC · February 24, 2024, 8:20pm

Here are three ways to transform the price column of a data frame, treating rows with different values in the name column separately. I don't have the Winsorize() function, so I used scale(), which sets the mean of the values to zero and the standard deviation to one.

DF <- data.frame(name = rep(c("Model1", "Model2", "Model3"),4),
                 price = runif(n = 12,min = 25,max = 100))
DF
#>      name    price
#> 1  Model1 90.54648
#> 2  Model2 76.48961
#> 3  Model3 36.07650
#> 4  Model1 27.95938
#> 5  Model2 43.13003
#> 6  Model3 33.33270
#> 7  Model1 62.58777
#> 8  Model2 54.85340
#> 9  Model3 60.32941
#> 10 Model1 82.00785
#> 11 Model2 96.41996
#> 12 Model3 64.20060

#Method 1. Put all the results in three data frames with a for loop

for(Nm in unique(DF$name)) {
  tmp <- DF[DF$name == Nm, ]
  tmp$price <- scale(tmp$price)
  assign(Nm, tmp)
}
Model1
#>      name      price
#> 1  Model1  0.8912794
#> 4  Model1 -1.3606422
#> 7  Model1 -0.1146918
#> 10 Model1  0.5840546
Model2
#>      name      price
#> 2  Model2  0.3714638
#> 5  Model2 -1.0421075
#> 8  Model2 -0.5453442
#> 11 Model2  1.2159878
Model3
#>      name      price
#> 3  Model3 -0.7740921
#> 6  Model3 -0.9452647
#> 9  Model3  0.7389261
#> 12 Model3  0.9804307

#method 2. Make data frames in a list and process them
DF_Split <- split(DF, DF$name)
DF_Split
#> $Model1
#>      name    price
#> 1  Model1 90.54648
#> 4  Model1 27.95938
#> 7  Model1 62.58777
#> 10 Model1 82.00785
#> 
#> $Model2
#>      name    price
#> 2  Model2 76.48961
#> 5  Model2 43.13003
#> 8  Model2 54.85340
#> 11 Model2 96.41996
#> 
#> $Model3
#>      name    price
#> 3  Model3 36.07650
#> 6  Model3 33.33270
#> 9  Model3 60.32941
#> 12 Model3 64.20060
DF_New2 <- lapply(DF_Split, function(DATA) DATA$price <- scale(DATA$price))
DF_New2
#> $Model1
#>            [,1]
#> [1,]  0.8912794
#> [2,] -1.3606422
#> [3,] -0.1146918
#> [4,]  0.5840546
#> attr(,"scaled:center")
#> [1] 65.77537
#> attr(,"scaled:scale")
#> [1] 27.79275
#> 
#> $Model2
#>            [,1]
#> [1,]  0.3714638
#> [2,] -1.0421075
#> [3,] -0.5453442
#> [4,]  1.2159878
#> attr(,"scaled:center")
#> [1] 67.72325
#> attr(,"scaled:scale")
#> [1] 23.59951
#> 
#> $Model3
#>            [,1]
#> [1,] -0.7740921
#> [2,] -0.9452647
#> [3,]  0.7389261
#> [4,]  0.9804307
#> attr(,"scaled:center")
#> [1] 48.4848
#> attr(,"scaled:scale")
#> [1] 16.02949

#Method 3. Use dplyr functions and have the results in one data frame
#Notice the original row order is preserved
library(dplyr)

DF_New3 <- DF |> group_by(name) |> mutate(price = scale(price))
DF_New3
#> # A tibble: 12 × 2
#> # Groups:   name [3]
#>    name   price[,1]
#>    <chr>      <dbl>
#>  1 Model1     0.891
#>  2 Model2     0.371
#>  3 Model3    -0.774
#>  4 Model1    -1.36 
#>  5 Model2    -1.04 
#>  6 Model3    -0.945
#>  7 Model1    -0.115
#>  8 Model2    -0.545
#>  9 Model3     0.739
#> 10 Model1     0.584
#> 11 Model2     1.22 
#> 12 Model3     0.980

^{Created on 2024-02-24 with reprex v2.0.2}

Your code didn't work because you are confusing the strings like "Model1" and the name of the data frame Model1 (with no quotes). In your for loop, the Nm variable takes the values "Model1", "Model", "Model3". Those are character values. The code Nm$price will not work because Nm is a character variable, not a data frame. Model1$price (where there are not quotes around Model1) does work because Model1 is the name of a data frame.

system · April 6, 2024, 8:21pm

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.