Dear all,
I have a simple question. I am trying to calculate the average value of different selected columns in a tibble.
For example, I have 364 columns. I would like to calculate one average value of Col 2, Col7 and Col 12, and also for another average value of Col 3 to 6, Col 8 to 11 and Col 13 to 16, and so on.
At the end, I would like to add as new columns.
Can anyone help me how to code?
It's unclear whether you are looking to take the mean rowwise or column wise.
mtcars[,1:3]
#> mpg cyl disp
#> Mazda RX4 21.0 6 160.0
#> Mazda RX4 Wag 21.0 6 160.0
#> Datsun 710 22.8 4 108.0
#> Hornet 4 Drive 21.4 6 258.0
#> Hornet Sportabout 18.7 8 360.0
#> Valiant 18.1 6 225.0
#> Duster 360 14.3 8 360.0
#> Merc 240D 24.4 4 146.7
#> Merc 230 22.8 4 140.8
#> Merc 280 19.2 6 167.6
#> Merc 280C 17.8 6 167.6
#> Merc 450SE 16.4 8 275.8
#> Merc 450SL 17.3 8 275.8
#> Merc 450SLC 15.2 8 275.8
#> Cadillac Fleetwood 10.4 8 472.0
#> Lincoln Continental 10.4 8 460.0
#> Chrysler Imperial 14.7 8 440.0
#> Fiat 128 32.4 4 78.7
#> Honda Civic 30.4 4 75.7
#> Toyota Corolla 33.9 4 71.1
#> Toyota Corona 21.5 4 120.1
#> Dodge Challenger 15.5 8 318.0
#> AMC Javelin 15.2 8 304.0
#> Camaro Z28 13.3 8 350.0
#> Pontiac Firebird 19.2 8 400.0
#> Fiat X1-9 27.3 4 79.0
#> Porsche 914-2 26.0 4 120.3
#> Lotus Europa 30.4 4 95.1
#> Ford Pantera L 15.8 8 351.0
#> Ferrari Dino 19.7 6 145.0
#> Maserati Bora 15.0 8 301.0
#> Volvo 142E 21.4 4 121.0
rowMeans(mtcars[,1:3])
#> Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive
#> 62.33333 62.33333 44.93333 95.13333
#> Hornet Sportabout Valiant Duster 360 Merc 240D
#> 128.90000 83.03333 127.43333 58.36667
#> Merc 230 Merc 280 Merc 280C Merc 450SE
#> 55.86667 64.26667 63.80000 100.06667
#> Merc 450SL Merc 450SLC Cadillac Fleetwood Lincoln Continental
#> 100.36667 99.66667 163.46667 159.46667
#> Chrysler Imperial Fiat 128 Honda Civic Toyota Corolla
#> 154.23333 38.36667 36.70000 36.33333
#> Toyota Corona Dodge Challenger AMC Javelin Camaro Z28
#> 48.53333 113.83333 109.06667 123.76667
#> Pontiac Firebird Fiat X1-9 Porsche 914-2 Lotus Europa
#> 142.40000 36.76667 50.10000 43.16667
#> Ford Pantera L Ferrari Dino Maserati Bora Volvo 142E
#> 124.93333 56.90000 108.00000 48.80000
colMeans(mtcars[,1:3])
#> mpg cyl disp
#> 20.09062 6.18750 230.72188
Created on 2020-12-03 by the reprex package (v0.3.0.9001)
Yes, row wise mean! But the columns are not in series as you said. So, I couldn't use ":" operator, like "1:3". Let's say, there are ten columns. I want to calculate the row wise mean of Col 3, 6 and 9. Also, I want to calculate the row wise mean of the columns Col 1,2,4,5,7, and 8. Something like that.
Mean 1 = (Col 3 + Col 6 + Col 9)/3
Mean 2 = (Col 1 + Col 2 + Col 4 + Col 5 + Col 7 + Col 8)/6
.
.
.
.
Mean n = ...
Try this
mtcars["group1"] <- rowMeans(mtcars[,c(3,6,9)], na.rm = TRUE)
mtcars["group1"]
#> group1
#> Mazda RX4 54.54000
#> Mazda RX4 Wag 54.62500
#> Datsun 710 37.10667
#> Hornet 4 Drive 87.07167
#> Hornet Sportabout 121.14667
#> Valiant 76.15333
#> Duster 360 121.19000
#> Merc 240D 49.96333
#> Merc 230 47.98333
#> Merc 280 57.01333
#> Merc 280C 57.01333
#> Merc 450SE 93.29000
#> Merc 450SL 93.17667
#> Merc 450SLC 93.19333
#> Cadillac Fleetwood 159.08333
#> Lincoln Continental 155.14133
#> Chrysler Imperial 148.44833
#> Fiat 128 27.30000
#> Honda Civic 26.10500
#> Toyota Corolla 24.64500
#> Toyota Corona 40.85500
#> Dodge Challenger 107.17333
#> AMC Javelin 102.47833
#> Camaro Z28 117.94667
#> Pontiac Firebird 134.61500
#> Fiat X1-9 27.31167
#> Porsche 914-2 41.14667
#> Lotus Europa 32.53767
#> Ford Pantera L 118.39000
#> Ferrari Dino 49.59000
#> Maserati Bora 101.85667
#> Volvo 142E 41.59333
Created on 2020-12-03 by the reprex package (v0.3.0.9001)
It works! Thanks indeed!
Dear Technocrat,
How could I attached those calculated values to the original rows?
Sorry, I am not so much familiar with RStudio.
Thanks in advance.
It's ok, we all start from zero. Mine was in 2007, and I'm still learning.
Using the pattern mtcars["group1"] <- rowMeans(mtcars[,c(3,6,9)], na.rm = TRUE)
sticks the result column onto the data frame by that way. It's not necessarily the optimal way to do this, but optimization is something to address after the easy ways have become a nuisance, not necessarily now.
I already calculated using your code. Calculation worked well. But I also want to attach those values to the original table.
Col 1 Col 2 Col 3 ...Mean 1 Mean 2
Row 1 -
will add a column named group1
Wow, so great! Thanks again. You helped me a lot.
R is like that, full of obscure features that are blindingly obvious once you see them applied to your own problem. What was illustrated is subset. The only trick is that it is row, column
Dear Technocrat,
Have a great day!
I have another three requests:
- I would like to automatically filter out the values which are greater or less than 10% of their previous values. Is it possible to code in RStudio? Do you have any idea?
- I am also trying to calculate the minimum/maximum/average values for selected rows (column wise). I have several rows more than 2000. From them, I would like to select only 5 rows (in series), calculate the minimum (max or average) value of those selected rows, and then add a new row using that calculated values for each column.
- I am trying several minute data for several year. So far, I could code only for one day. I would like to automatically run for another day. Is it possible to code in RStudio? Do you have any idea?
Thanks in advance.
- The
dplyr::lag
function can be used to perform operations on ranges of previous row entries in the same or a different column. It may be convenient todplyr::mutate
a new result column based on the test and then%>%
tofilter(TEST != 1)
(for example). - You can
subset
rows with columns
min(mtcars[2:5,1])
#> [1] 18.7
max(mtcars[2:5,1])
#> [1] 22.8
mean(mtcars[2:5,1], na.rm = TRUE)
#> [1] 20.975
Created on 2020-12-06 by the reprex package (v0.3.0.9001)
3. Construct your code as a function, say run_day
. Then you can create a vector of days and use purrr:map
or lappy
to apply the function to each of the days and return a nested list.
I tried your explanation, but I couldn't follow.
- First, I checked the lag function in RStudio. But it does not work well.
For example,
I have 30-min data for the whole day. I would like to filter the values in every next column by the values +/- 10% of those values in every previous column. In the morning, the values will increase according the values in the previous columns + 10% of those values, but in the afternoon, the values will decrease by - 10%. Something like that. Let me show the excel calculation as follow:
6:00 | 6:30 | 7:00 | 7:30 | 8:00 | 8:30 | 9:00 | 9:30 | 10:00 | 10:30 | 11:00 | 11:30 | 12:00 | 12:30 | 13:00 | 13:30 | 14:00 | 14:30 | 15:00 | 15:30 | 16:00 | 16:30 | 17:00 | 17:30 | 18:00 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
10.00 | 11.00 | 12.10 | 13.31 | 14.64 | 16.11 | 17.72 | 19.49 | 21.44 | 23.58 | 25.94 | 11.00 | 12.10 | 10.89 | 9.80 | 8.82 | 7.94 | 7.14 | 6.43 | 5.79 | 5.21 | 4.69 | 4.22 | 3.80 | 3.42 |
I want to code like this in RStudio.
-
I would like to calculate the minimum values of row number 5~9 from 100 rows for every columns.
When I used the codes you provided, the only one minimum value is resulted. I don't know what I should do. -
I didn't try it yet. I will try it later.
I hope to be helpful by linking you to this forum guide to asking questions.
I think a large part of what is holding back the effective communications, is a lack of example data,
which should help to make your comments on it more concrete.
FAQ: Tips for writing R-related questions - meta - RStudio Community
I will also link to r4ds which is a good way for anyone to learn R/tidyverse, and in particular data transformation https://r4ds.had.co.nz/
Thank you, Nirgrahamuk! I will follow the rule.
- Try something like
filter <- data$column > lag(data$column) * 0.1 # has values TRUE or FALSE
- If you need more averages, I think you cannot avoid using a
for
cycle.
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.