Hi there,
I'm new to the R and I'm trying to calculate some statistics on the database. I'm stuck and I would really appreciate some help
I was able to group it, I have a statistics, but for one grouping and one column only.
So I want my script to do:
- Cut the database based on the selected column (done)
- calculate group statistics for first specified column (done)
- repeat calculation for the same statistics, but for different column (not sure how)
- repeat above steps for different column (not sure how
I was able to put a variable in (called Benchmark and Component) in the code, but I don't know how to loop it, so if I specify several names of the columns in the variable, it would pick the first one, do the steps and then repeat for the 2nd one.
I attached a print screen of sample database (yellow - columns by which database should be cut - one at a time, green - columns for which statistics should be calculated - one at a time).
My code so far:
RAW <- read.csv('H:/RawDatabase/Database for R.csv',header=T)
RAW
Benchmark_Name <- ('Benchmark_1')
Component <- ("BASE")
Benchmark <- select(RAW,Benchmark_Name)
results_by_org <- RAW %>%
filter(Benchmark == 1) %>%
group_by(CODE) %>%
summarise(Obs = length(na.omit(!!sym(Component))),
Mean_CPY = mean(!!sym(Component),na.rm = F)
)
final_stats <- results_by_org %>%
summarise(!!paste0("Orgs_",Component) := nrow(na.omit(results_by_org)),
!!paste0("Obs_",Component) := sum(na.omit(Obs)),
!!paste0("P25_",Component) := quantile(Mean_CPY,probs = (0.25),na.rm = T,type = 6),
!!paste0("Mean_",Component) := mean(na.omit(Mean_CPY)),
!!paste0("P50_",Component) := quantile(Mean_CPY,probs = (0.5),na.rm = T,type = 6),
!!paste0("P75_",Component) := quantile(Mean_CPY,probs = (0.75),na.rm = T,type = 6)) %>%
mutate(Benchmark_Name) %>%
select(Benchmark_Name,1:ncol(.))