Code for average of selected columns in tibble?

nchan08 · December 3, 2020, 7:38am

Dear all,
I have a simple question. I am trying to calculate the average value of different selected columns in a tibble.
For example, I have 364 columns. I would like to calculate one average value of Col 2, Col7 and Col 12, and also for another average value of Col 3 to 6, Col 8 to 11 and Col 13 to 16, and so on.
At the end, I would like to add as new columns.
Can anyone help me how to code?

technocrat · December 3, 2020, 8:09am

It's unclear whether you are looking to take the mean rowwise or column wise.

mtcars[,1:3]
#>                      mpg cyl  disp
#> Mazda RX4           21.0   6 160.0
#> Mazda RX4 Wag       21.0   6 160.0
#> Datsun 710          22.8   4 108.0
#> Hornet 4 Drive      21.4   6 258.0
#> Hornet Sportabout   18.7   8 360.0
#> Valiant             18.1   6 225.0
#> Duster 360          14.3   8 360.0
#> Merc 240D           24.4   4 146.7
#> Merc 230            22.8   4 140.8
#> Merc 280            19.2   6 167.6
#> Merc 280C           17.8   6 167.6
#> Merc 450SE          16.4   8 275.8
#> Merc 450SL          17.3   8 275.8
#> Merc 450SLC         15.2   8 275.8
#> Cadillac Fleetwood  10.4   8 472.0
#> Lincoln Continental 10.4   8 460.0
#> Chrysler Imperial   14.7   8 440.0
#> Fiat 128            32.4   4  78.7
#> Honda Civic         30.4   4  75.7
#> Toyota Corolla      33.9   4  71.1
#> Toyota Corona       21.5   4 120.1
#> Dodge Challenger    15.5   8 318.0
#> AMC Javelin         15.2   8 304.0
#> Camaro Z28          13.3   8 350.0
#> Pontiac Firebird    19.2   8 400.0
#> Fiat X1-9           27.3   4  79.0
#> Porsche 914-2       26.0   4 120.3
#> Lotus Europa        30.4   4  95.1
#> Ford Pantera L      15.8   8 351.0
#> Ferrari Dino        19.7   6 145.0
#> Maserati Bora       15.0   8 301.0
#> Volvo 142E          21.4   4 121.0
rowMeans(mtcars[,1:3])
#>           Mazda RX4       Mazda RX4 Wag          Datsun 710      Hornet 4 Drive 
#>            62.33333            62.33333            44.93333            95.13333 
#>   Hornet Sportabout             Valiant          Duster 360           Merc 240D 
#>           128.90000            83.03333           127.43333            58.36667 
#>            Merc 230            Merc 280           Merc 280C          Merc 450SE 
#>            55.86667            64.26667            63.80000           100.06667 
#>          Merc 450SL         Merc 450SLC  Cadillac Fleetwood Lincoln Continental 
#>           100.36667            99.66667           163.46667           159.46667 
#>   Chrysler Imperial            Fiat 128         Honda Civic      Toyota Corolla 
#>           154.23333            38.36667            36.70000            36.33333 
#>       Toyota Corona    Dodge Challenger         AMC Javelin          Camaro Z28 
#>            48.53333           113.83333           109.06667           123.76667 
#>    Pontiac Firebird           Fiat X1-9       Porsche 914-2        Lotus Europa 
#>           142.40000            36.76667            50.10000            43.16667 
#>      Ford Pantera L        Ferrari Dino       Maserati Bora          Volvo 142E 
#>           124.93333            56.90000           108.00000            48.80000
colMeans(mtcars[,1:3])
#>       mpg       cyl      disp 
#>  20.09062   6.18750 230.72188

^{Created on 2020-12-03 by the reprex package (v0.3.0.9001)}

nchan08 · December 4, 2020, 12:09am

Yes, row wise mean! But the columns are not in series as you said. So, I couldn't use ":" operator, like "1:3". Let's say, there are ten columns. I want to calculate the row wise mean of Col 3, 6 and 9. Also, I want to calculate the row wise mean of the columns Col 1,2,4,5,7, and 8. Something like that.
Mean 1 = (Col 3 + Col 6 + Col 9)/3
Mean 2 = (Col 1 + Col 2 + Col 4 + Col 5 + Col 7 + Col 8)/6
.
.
.
.
Mean n = ...

technocrat · December 4, 2020, 12:34am

Try this

mtcars["group1"] <- rowMeans(mtcars[,c(3,6,9)], na.rm = TRUE)
mtcars["group1"]
#>                        group1
#> Mazda RX4            54.54000
#> Mazda RX4 Wag        54.62500
#> Datsun 710           37.10667
#> Hornet 4 Drive       87.07167
#> Hornet Sportabout   121.14667
#> Valiant              76.15333
#> Duster 360          121.19000
#> Merc 240D            49.96333
#> Merc 230             47.98333
#> Merc 280             57.01333
#> Merc 280C            57.01333
#> Merc 450SE           93.29000
#> Merc 450SL           93.17667
#> Merc 450SLC          93.19333
#> Cadillac Fleetwood  159.08333
#> Lincoln Continental 155.14133
#> Chrysler Imperial   148.44833
#> Fiat 128             27.30000
#> Honda Civic          26.10500
#> Toyota Corolla       24.64500
#> Toyota Corona        40.85500
#> Dodge Challenger    107.17333
#> AMC Javelin         102.47833
#> Camaro Z28          117.94667
#> Pontiac Firebird    134.61500
#> Fiat X1-9            27.31167
#> Porsche 914-2        41.14667
#> Lotus Europa         32.53767
#> Ford Pantera L      118.39000
#> Ferrari Dino         49.59000
#> Maserati Bora       101.85667
#> Volvo 142E           41.59333

^{Created on 2020-12-03 by the reprex package (v0.3.0.9001)}

nchan08 · December 4, 2020, 1:31am

It works! Thanks indeed!

nchan08 · December 4, 2020, 3:03am

Dear Technocrat,
How could I attached those calculated values to the original rows?
Sorry, I am not so much familiar with RStudio.
Thanks in advance.

technocrat · December 4, 2020, 3:59am

It's ok, we all start from zero. Mine was in 2007, and I'm still learning.

Using the pattern mtcars["group1"] <- rowMeans(mtcars[,c(3,6,9)], na.rm = TRUE) sticks the result column onto the data frame by that way. It's not necessarily the optimal way to do this, but optimization is something to address after the easy ways have become a nuisance, not necessarily now.

nchan08 · December 4, 2020, 4:47am

I already calculated using your code. Calculation worked well. But I also want to attach those values to the original table.

Col 1 Col 2 Col 3 ...Mean 1 Mean 2
Row 1 -

technocrat · December 4, 2020, 4:59am

will add a column named group1

nchan08 · December 4, 2020, 5:26am

Wow, so great! Thanks again. You helped me a lot.

technocrat · December 4, 2020, 6:12am

R is like that, full of obscure features that are blindingly obvious once you see them applied to your own problem. What was illustrated is subset. The only trick is that it is row, column

nchan08 · December 7, 2020, 12:15am

Dear Technocrat,
Have a great day!
I have another three requests:

I would like to automatically filter out the values which are greater or less than 10% of their previous values. Is it possible to code in RStudio? Do you have any idea?
I am also trying to calculate the minimum/maximum/average values for selected rows (column wise). I have several rows more than 2000. From them, I would like to select only 5 rows (in series), calculate the minimum (max or average) value of those selected rows, and then add a new row using that calculated values for each column.
I am trying several minute data for several year. So far, I could code only for one day. I would like to automatically run for another day. Is it possible to code in RStudio? Do you have any idea?
Thanks in advance.

technocrat · December 7, 2020, 5:31am

The dplyr::lag function can be used to perform operations on ranges of previous row entries in the same or a different column. It may be convenient to dplyr::mutate a new result column based on the test and then %>% to filter(TEST != 1) (for example).
You can subset rows with columns

min(mtcars[2:5,1])
#> [1] 18.7
max(mtcars[2:5,1])
#> [1] 22.8
mean(mtcars[2:5,1], na.rm = TRUE)
#> [1] 20.975

^{Created on 2020-12-06 by the reprex package (v0.3.0.9001)}
3. Construct your code as a function, say run_day. Then you can create a vector of days and use purrr:map or lappy to apply the function to each of the days and return a nested list.

nchan08 · December 7, 2020, 8:30am

I tried your explanation, but I couldn't follow.

First, I checked the lag function in RStudio. But it does not work well.
For example,
I have 30-min data for the whole day. I would like to filter the values in every next column by the values +/- 10% of those values in every previous column. In the morning, the values will increase according the values in the previous columns + 10% of those values, but in the afternoon, the values will decrease by - 10%. Something like that. Let me show the excel calculation as follow:

6:00	6:30	7:00	7:30	8:00	8:30	9:00	9:30	10:00	10:30	11:00	11:30	12:00	12:30	13:00	13:30	14:00	14:30	15:00	15:30	16:00	16:30	17:00	17:30	18:00
10.00	11.00	12.10	13.31	14.64	16.11	17.72	19.49	21.44	23.58	25.94	11.00	12.10	10.89	9.80	8.82	7.94	7.14	6.43	5.79	5.21	4.69	4.22	3.80	3.42

I want to code like this in RStudio.

I would like to calculate the minimum values of row number 5~9 from 100 rows for every columns.
When I used the codes you provided, the only one minimum value is resulted. I don't know what I should do.
I didn't try it yet. I will try it later.

nirgrahamuk · December 7, 2020, 9:32am

I hope to be helpful by linking you to this forum guide to asking questions.
I think a large part of what is holding back the effective communications, is a lack of example data,
which should help to make your comments on it more concrete.
FAQ: Tips for writing R-related questions - meta - RStudio Community

I will also link to r4ds which is a good way for anyone to learn R/tidyverse, and in particular data transformation https://r4ds.had.co.nz/

nchan08 · December 8, 2020, 1:33am

Thank you, Nirgrahamuk! I will follow the rule.

Zack83 · December 8, 2020, 12:37pm

Try something like

filter <- data$column > lag(data$column) * 0.1 # has values TRUE or FALSE

If you need more averages, I think you cannot avoid using a for cycle.

system · December 29, 2020, 12:37pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.