Allocating loops to variables name

freddywit · February 23, 2021, 12:13am

Hey guys,

I was successful with generating a loop for almost all commands.

this is my >data.frame:

  Date          OCC1990 DEGREE EMPSTAT 
1    1990-01     457      1      10  
2    1990-01     223      1      10   
3    1990-01     221      0      10   
4    1990-01     229      0      2   
5    1990-02     223      1      10   
6    1990-02     224      0      10    
7    1990-02     225      0      10

I successfully created a loop that counts the number of people having a degree and being employed for a certain occupation. So i's in the loop depend on the occupation. The variable `COLLEMP`` refers just to people who are working (regardless the occupation) and who graduated.

for (i in 223:229) {
  COLLEGE <- paste(i, sep = "_")
  EMPLOYED <- paste(i, sep = "_")
  
  cps_data[[COLLEGE]] <- ifelse(
    ( 
      (cps_data$COLLEMP %in% c(1)) &
        (cps_data$OCC1990 == i)),
    1*cps_data$HWTFINL, 0) 
  
  cps_data[[EMPLOYED]] <- ifelse(
    ( 
      (cps_data$EMPSTAT %in% c(10))&
        (cps_data$OCC1990 == i)),
    1*cps_data$HWTFINL, 0) 
  
  COLLEGESUM <- paste(i, sep = "_")
  EMPLOYEDSUM <- paste(i, sep = "_")
  COLLEGE_SUM <- paste(i, sep = "_")
  EMPLOYED_SUM <- paste(i, sep = "_")
  OCC <- paste(i, sep = "_") 
  
  COLLEGELAB[[COLLEGESUM]] <- cps_data %>%                                        
    group_by(YEAR) %>%                         
    summarise_at(vars(COLLEGE),             
                 list(COLLEGE_SUM = sum)) 
  COLLEGELAB[[EMPLOYEDSUM]] <- cps_data %>%                                        
    group_by(YEAR) %>%                         
    summarise_at(vars(EMPLOYED),             
                 list(EMPLOYED_SUM = sum)) 
  
  COLLEGELAB[[OCC]] <- cps_data %>%                                        
    group_by(YEAR) %>%                         
    summarise_at(vars(COLLEGE, EMPLOYED),             
                 list(COLLEGE_SUM = sum, EMPLOYED_SUM = sum))

Until there, the looping was successful and I stored my results in the data.frame COLLEGELAB, which looks like that now:

  223.YEAR      223.COLLEGE_COLLEGE_SUM 223.COLLEGE_EMPLOYED_SUM       224.YEAR      224.COLLEGE_COLLEGE_SUM 224.COLLEGE_EMPLOYED_SUM   college_rate
1    1990     873221                       1004000   1                   1990              773221                       2024000           desired value 
2    1991     834501                       1030000  2                    1991              734501                       2030000           desired value 
3    1992     834543                       1200000  3                    1992              734543                       2200000          desired value 
4    1993     843500                       1000050  4                    1993              743500                       2000050          desired value 
5    1994     834510                       1040000  5                    1994              734510                       2040000          desired value 
6    1995     834340                       1005000  6                    1995              734340                       2005000          desired value

What try to do now, is to divide for example 223.COLLEGE_COLLEGE_SUM by 223.COLLEGE_EMPLOYED_SUM. This seems to be quite complicated because the two variables contain the different"i's" already, which I used all the time to loop the different occupations.

college_rate <- paste(i, sep = "_")
COLLEGE_COLLEGE_SUM <- paste(i, sep = ".", "COLLEGE_COLLEGE_SUM")
COLLEGE_EMPLOYED_SUM <- paste(i, sep = ".", "COLLEGE_EMPLOYED_SUM")
#PROBLEM
COLLEGELAB <- transform(COLLEGELAB, college_rate =  
                          COLLEGE_COLLEGE_SUM / COLLEGE_EMPLOYED_SUM)

My last idea, by using the paste argument again, did not work due to non-numeric argument for binary operator.

Many thanks in advance!

technocrat · February 23, 2021, 12:39am

See the FAQ: How to do a minimal reproducible example reprex for beginners. This kind of problem without representative data makes it difficult to give very specific suggestions.

In general, while it's possible to do this in R, anyone who has an imperative/ procedural programming language such as C, C++ or Python should script this outside R to generate a cut and paste script.

I just finished banging my head on the R logic before turning to Python. This snippet will give you a flavor in a case where I needed to loop over one object and use the index to take values from two objects.

def use_py():
  Vars = ["DO","pH","Temperature","Turbidity","ORP","Ammonium","Nitrates","BGA","Chlorophyll"]
  part1 = "multcompBoxplot("
  part2 = "~ Month, data=bpdata[,c(1,"
  part3 = ")])"
  for i in range(0,9): print(part1,Vars[i],part2,i+2,part3)

multcompBoxplot(DO ~ Month, data = bpdata[, c(1, 2)])
multcompBoxplot(pH ~ Month, data = bpdata[, c(1, 3)])
multcompBoxplot(Temperature ~ Month, data = bpdata[, c(1, 4)])
multcompBoxplot(Turbidity ~ Month, data = bpdata[, c(1, 5)])
multcompBoxplot(ORP ~ Month, data = bpdata[, c(1, 6)])
multcompBoxplot(Ammonium ~ Month, data = bpdata[, c(1, 7)])
multcompBoxplot(Nitrates ~ Month, data = bpdata[, c(1, 8)])
multcompBoxplot(BGA ~ Month, data = bpdata[, c(1, 9)])
multcompBoxplot(Chlorophyll ~ Month, data = bpdata[, c(1, 10)])

For the OP snip, be aware that nothing defined in the function environment, except possibly the last iteration, is going to escape into the global environment unless you initialize a collector object outside the function and increment it within the loop. See help(for), which has some admonitions about this.

freddywit · February 23, 2021, 3:39pm

Hey thanks for your message!
Your message helped me to loop the most of my commands. But now I face another problem , which I've updated in the post above. Moreover, I made my example also reproducible, sry for not doing so in the beginning.

I hope this makes it illustrative.
Many thanks in advance!

technocrat · February 24, 2021, 7:54am

The very long variable names made it difficult to be sure of your intent.

suppressPackageStartupMessages({
  library(dplyr)
})

obj <- data.frame(
  yr1 =
    c(1990, 1991, 1992, 1993, 1994, 1995),
  coll_sum1 =
    c(873221, 834501, 834543, 843500, 834510, 834340),
  coll_emp1 =
    c(1004000, 1030000, 1200000, 1000050, 1040000, 1005000),
  yr2 =
    c(1990, 1991, 1992, 1993, 1994, 1995),
  coll_sum2 =
    c(773221, 734501, 734543, 743500, 734510, 734340),
  coll_emp2 =
    c(2024000, 2030000, 2200000, 2000050, 2040000, 2005000),
  coll_rate =
    c("desiredvalue", "desiredvalue", "desiredvalue", "desiredvalue", 
      "desiredvalue", "desiredvalue")
)

obj %>% mutate(coll_rate = coll_sum2/coll_emp2)
#>    yr1 coll_sum1 coll_emp1  yr2 coll_sum2 coll_emp2 coll_rate
#> 1 1990    873221   1004000 1990    773221   2024000 0.3820262
#> 2 1991    834501   1030000 1991    734501   2030000 0.3618232
#> 3 1992    834543   1200000 1992    734543   2200000 0.3338832
#> 4 1993    843500   1000050 1993    743500   2000050 0.3717407
#> 5 1994    834510   1040000 1994    734510   2040000 0.3600539
#> 6 1995    834340   1005000 1995    734340   2005000 0.3662544

freddywit · February 24, 2021, 11:19am

Thanks again for your message!
The issue I face is that I created coll_sum2 and coll_emp2 as well as coll_sum3, coll_emp3, and so on... already with a loop. The loop ranges for 400 numbers so that I obtain coll_sum400 and coll_emp400 in the end. This makes it impossible to divide it "manually" with the function "mutate". Maybe I was not precise enough about that. So my actual question is how do I create a loop that divides coll_empx by coll_sumx for i in 1:400, if the variables names already contain i.

Many thanks in advance!

nirgrahamuk · February 24, 2021, 11:46am

As technocrat suggests, specific help is much easier to deliver when there is a reprex provided, could you try to do this ?

freddywit · February 24, 2021, 7:26pm

Of course!!

Based on technocrats df:

obj <- data.frame (
  yr = c(1990, 1991, 1992, 1993, 1994, 1995),
  occ = c(223, 224, 225, 226, 227, 228),
  emp = c(10, 10, 10, 10, 10, 10),
  degree = c(1, 1, 1, 1, 1, 1)
  
  
  COLLEGELAB <- data.frame (yr = c(1990, 1991, 1992, 1993, 1994, 1995))  
  
  
)

for (i in 223:228) {
  COLLEGE <- paste(i, sep = "_")
  EMPLOYED <- paste(i, sep = "_")
  
  obj[[COLLEGE]] <- ifelse(
    ( 
      (obj$degree %in% c(1)) &
        (obj$occ == i)),
    1, 0) 
  
  obj[[EMPLOYED]] <- ifelse(
    ( 
      (obj$emp %in% c(10))&
        (obj$occ == i)),
    1, 0)
  
  COLLEGESUM <- paste(i, sep = "_")
  EMPLOYEDSUM <- paste(i, sep = "_")
  COLLEGE_SUM <- paste(i, sep = "_")
  EMPLOYED_SUM <- paste(i, sep = "_")
  OCC <- paste(i, sep = "_") 
  
  COLLEGELAB[[OCC]] <- obj %>%                                        
    group_by(yr) %>%                         
    summarise_at(vars(COLLEGE, EMPLOYED),             
                 list(COLLEGE_SUM = sum, EMPLOYED_SUM = sum))
  
# problem of dividing college_sum by employed_sum across all i's since every i occurs already in columns names

  #goal:  collegerate = COLLEGE_SUMi / EMPLOYED_SUMi 
}

In my real dataframe there are million of values and they differ from each other obviously as well.

Many thanks guys!

nirgrahamuk · February 24, 2021, 7:54pm

Hi again, I'm sorry but I'm going to suggest you put more effort into making the reprex.
The data shows only constant emp and degree , therefore it seems substantially different from what you originally posed.
I would think you can avoid exotic forloops to do the processing.
probably pivot_wider and friends would cover most of your needs.
but it would help to get a simple 'verbal' description of the intent of what your data means, and what you want to do with it.

Anyway, here is an example where i take your obj and make it directly without forloops

(obj_1 <- data.frame (
  yr = c(1990, 1991, 1992, 1993, 1994, 1995),
  occ = c(223, 224, 225, 226, 227, 228),
  emp = c(10, 10, 10, 10, 10, 10),
  degree = c(1, 1, 1, 1, 1, 1)
))

library(tidyverse)

(obj_2 <- bind_cols(
  obj_1,
  pivot_wider(obj_1,
              id_cols = yr, names_from = occ,
              values_from = occ
  ) %>%
    mutate_at(-1, ~ ifelse(!is.na(.), 1, 0)) %>%
    select(-yr)
))

freddywit · February 24, 2021, 11:14pm

Thanks for your response!
Well, in my real data the variables degree and emp do not contain constant values.

COLLEGELAB[[OCC]] <- obj %>%                                        
    group_by(yr) %>%                         
    summarise_at(vars(COLLEGE, EMPLOYED),             
                 list(COLLEGE_SUM = sum, EMPLOYED_SUM = sum))

The code above lists the sum of COLLEGE (people who obtained a degree and who depend to occupation i) as well as the sum of EMPLOYED (people who are currently employed in occupation i). My GOAL is now to create a college_rate for every occupation i, dividing the COLLEGE_SUM by EMPLOYED_SUM for EVERY i.
I was struggling to do that all the time cause the column names of the variables COLLEGE_SUM and EMPLOYED_SUM contained already the i of their occupation, which made it impossible for me to create a for loop based on i. I see your way gives an alternative of creating a for loop, but I'm still struggling to implement it into mine.

Many thanks!

technocrat · February 25, 2021, 12:30am

Don’t put everything in the same data frame. Split by your variable of interest into a list of data frames, map over the data frames to create your target variable.

nirgrahamuk · February 25, 2021, 6:21am

That's understood,. You could greatly improve the reprex by providing data that is representative in that regard

system · March 18, 2021, 6:21am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.