I want to write a for loop for my problem. I want to do column-based normalization within each year group, so I want to write a for loop function that first filters the year does the normalization (with my function lapply(tmp[2:3], function(tmp) bestNormalize(tmp , standardize=TRUE, quiet = TRUE)) for each column and then pass to next year and so on and want to save the results to a list. My data look like
Year
Score 1
Score 2
2012
34
45
2012
41
46
2013
31
44
2013
44
33
2014
35
56
2014
42
21
I wrote this but it gives me the final year only, I am a newbie and could not find the similar example as my case, can someone help me?
i=2012
for (i in 1:3){
tmp = newdf[newdf$Year==i+2011,]
abc = lapply(tmp[2:3], function(tmp) bestNormalize(tmp , standardize=TRUE, quiet = TRUE))
print(abc)
}
could you give some more details or a wanted outcome? So do you want to standardize (e.g. subtract mean and divide by standard deviation) all values from the data.frame (e.g. Score 1 and Score 2) within a given year? Or only all values from Score 1 and Score 2 by group separately?
I assume the result would be a list with a data.frame for every year, containing 2 columns (Score 1 and Score 2 normalized?). But maybe you can clarify it a bit, so I can think of an optimal solution.
I found I had to use out_of_sample param as with 2 entries per variable per year, there was insufficient data to do k-fold stuff. I thought I should use $x.t to get just the transformed data
Hi thank you, this is "list with a data.frame for every year, containing 2 columns" exactly what I want but each score needs to be normalized within each year group.
The result is a list (as required), with the standardized outputs for every year group (as above as well). As @nirgrahamuk mentioned, out_of_sample = FALSE has to be called as well.