Hi Commnity i'm have a data frame with many columns. I need to calculate the mean by each 5 column. The first column is the ID
For example, first mean are column 1 to 5, the second one is column 6 to 10.
The data link:
How make this for all 35 columns. I have othe data with 800 columns.
library(dplyr)
spectra |> group_by(Wavelength) |>
summarise(media= c(spectrum00001.asd.sco+spectrum00002.asd.sco+
spectrum00003.asd.sco+spectrum00004.asd.sco+
spectrum00001.asd.sco)/5) # this run well but is so difficult make one by one.
spectra |> group_by(Wavelength) |> # this run wrong
mutate(promedio = rowMeans(across(seq(1, ncol(spectra), by = 4),
.names = "prom_{.col}")))
I would pivot the data to a long format, extract the spectrum number (e.g. 01,02, etc), make groups by integer division of those numbers by 5, and calculate the average of the groups for each wavelength.
library(tidyr)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(stringr)
DF <- read.csv("~/R/Play/Spectra_v4.csv")
DFlong <- DF |> pivot_longer(cols = -Wavelength, names_to = "Label", values_to = "Value")
DFlong <- DFlong |> mutate(Group = (as.numeric(str_sub(Label,12,13)) -1) %/% 5)
DFlong
#> # A tibble: 75,285 × 4
#> Wavelength Label Value Group
#> <int> <chr> <dbl> <dbl>
#> 1 350 spectrum00001.asd.sco 0.207 0
#> 2 350 spectrum00002.asd.sco 0.209 0
#> 3 350 spectrum00003.asd.sco 0.212 0
#> 4 350 spectrum00004.asd.sco 0.203 0
#> 5 350 spectrum00005.asd.sco 0.205 0
#> 6 350 spectrum00006.asd.sco 0.220 1
#> 7 350 spectrum00007.asd.sco 0.226 1
#> 8 350 spectrum00008.asd.sco 0.225 1
#> 9 350 spectrum00009.asd.sco 0.222 1
#> 10 350 spectrum00010.asd.sco 0.231 1
#> # … with 75,275 more rows
DF_stats <- DFlong |> group_by(Wavelength, Group) |> summarize(Avg = mean(Value))
#> `summarise()` has grouped output by 'Wavelength'. You can override using the
#> `.groups` argument.
DF_stats
#> # A tibble: 15,057 × 3
#> # Groups: Wavelength [2,151]
#> Wavelength Group Avg
#> <int> <dbl> <dbl>
#> 1 350 0 0.207
#> 2 350 1 0.225
#> 3 350 2 0.187
#> 4 350 3 0.157
#> 5 350 4 0.147
#> 6 350 5 0.0979
#> 7 350 6 0.254
#> 8 351 0 0.207
#> 9 351 1 0.226
#> 10 351 2 0.191
#> # … with 15,047 more rows