Dear all,
Have a great day!
I am trying to calculate the minimum/maximum/average values for selected rows (column wise). I have several rows more than 2000. From them, I would like to select only 5 rows (in series), calculate the minimum (max or average) value of those selected rows, and then add a new row using that calculated values for each column.
Thanks in advance.
You can do that with dplyr
. Specifically, filter()
or slice()
allow you to select rows based on a criterion, and summarize()
lets you compute summary statistics such as mean or max. You can find detailed explanations here. To automate column-wise calculations for many columns, you can also take a look at across()
.
For example:
library(tidyverse)
dat <- data.frame(x = 1:50,
y = 11:60)
subset_dat <- slice(dat, 10:15)
bind_rows(subset_dat,
summarize(subset_dat,
across(where(is.numeric), mean)))
#> x y
#> 1 10.0 20.0
#> 2 11.0 21.0
#> 3 12.0 22.0
#> 4 13.0 23.0
#> 5 14.0 24.0
#> 6 15.0 25.0
#> 7 12.5 22.5
Created on 2020-12-04 by the reprex package (v0.3.0)
Dear AlexisW,
Thanks indeed!
I tried your code, but it does not work for me.
Actually, I would like to calculate the minimum values of row number 5~9 from 100 rows for every columns.
When I used the codes you provided, the only one minimum value is resulted. I don't know what I should do.
I don't understand. Do you mean rows 5-9 for every 100 row? So it would be the min of rows 5,6,7,8,9 then the min of rows 109,110,111,112,113, then min of rows 213,214,etc...
Or something else? It would be easier to help if you provided a minimal example with the expected result, as described here.
It would be min of rows 5,6,7,8, and 9 from the rows 1 to 100.
so, just the rows 5,6,7,8,9? What do you mean "from the rows 1 to 100"?
I have so many rows (let's say 100 rows... So, row numbers will be from 1 to 100). From 100 rows, I would like to take out only a few rows (Let's say row numbers 5,6, 7,8,9). And then I would like to calculate the minimum value of Row 5,6,7,8,9.
Oh, you actually want the min across columns? Then with that code it should work:
library(tidyverse)
dat <- data.frame(x = sample(1:50),
y = sample(1:50))
subset_dat <- slice(dat, 5:9)
cbind(subset_dat,
min_x_y = matrixStats::rowMins(as.matrix(subset_dat)))
#> x y min_x_y
#> 1 30 34 30
#> 2 27 2 2
#> 3 50 49 49
#> 4 3 5 3
#> 5 36 27 27
Created on 2020-12-08 by the reprex package (v0.3.0)
Dear AlexisW
I couldn't run your code. The result is as follow:
library(reprex)
dat <- data.frame(x = sample(1:50),
-
y = sample(1:50))
data
[1] 341 345 338 339 340 343 341 343 341 328 343 347 337
[14] 348 339
subset_dat <- slice(dat, 5:9)
Error in slice(dat, 5:9) : could not find function "slice"
I don't know what is the error.
you are missing tidyverse library which contains dplyr library which contains the slice function:
Yes, I tried with tidyverse. But it doesn't work like you.
library(tidyverse)
-- Attaching packages --------------------------------------- tidyverse 1.3.0 --
v ggplot2 3.3.2 v purrr 0.3.4
v tibble 3.0.3 v dplyr 1.0.2
v tidyr 1.1.2 v stringr 1.4.0
v readr 1.3.1 v forcats 0.5.0
-- Conflicts ------------------------------------------ tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
Warning message:
package ‘tidyverse’ was built under R version 4.0.3
dat <- data.frame(x = sample (1:50),
-
y = sample (1:50))
data
[1] 341 345 338 339 340 343 341 343 341 328 343 347 337
[14] 348 339
subset_dat <- slice(dat, 5:9)
cbind(subset_dat,
-
min_x_y = rowMins(subset_dat))
Error in rowMins(subset_dat) : could not find function "rowMins"
You need the package matrixStats
for rowMins()
Dear Williaml,
Still error!
install.packages("matrixStats")
Installing package into ‘C:/Users/Nyein Chan/Documents/R/win-library/4.0’
(as ‘lib’ is unspecified)
trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.0/matrixStats_0.57.0.zip'
Content type 'application/zip' length 1574276 bytes (1.5 MB)
downloaded 1.5 MB
package ‘matrixStats’ successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\Nyein Chan\AppData\Local\Temp\RtmpkrqCmg\downloaded_packages
library(matrixStats)
Attaching package: ‘matrixStats’
The following object is masked from ‘package:dplyr’:
count
Warning message:
package ‘matrixStats’ was built under R version 4.0.3
library(tidyverse)
dat <- data.frame(x = sample (1:50),
-
y = sample (1:50))
data
[1] 341 345 338 339 340 343 341 343 341 328 343 347 337
[14] 348 339subset_dat <- slice(dat, 5:9)
cbind(subset_dat,
-
min_x_y = rowMins(subset_dat))
Error in rowMins(subset_dat) : Argument 'x' must be a matrix or a vector.
Are you trying to use data
in the slice?
It should be subset_dat <- slice(data, 5:9)
if you are. Although, not sure what your data is like.
Then notice the as.matrix
in the code by @AlexisW, which you haven't included.
cbind(subset_dat,
min_x_y = matrixStats::rowMins(as.matrix(subset_dat)))
Anyway, it would be much easier if you could provide a reproducible example:
Oh, yeah! Now I could run. I forgot to add as.matrix. Thanks indeed. But here, the code for minimum value for column x and column y.
In fact, I want to calculate the minimum value for each column. For example, minimum value for selected rows (Row 5 to 9) for column x and for column y.
library(tidyverse)
(dat <- data.frame(x = 1:10,
y = 10:1))
(subset_dat <- slice(dat, 3:7))
(summarise_all(subset_dat,min))
?
Thank you, nirgrahamuk! It works well.
Dear @AlexisW @williaml @nirgrahamuk ,
If I want to select two different ranges of data, how could I code?
I tried as follow, but it does not work.
subset_dat <- slice (dat, 3:5|7:9)
(or)
subset_dat <- slice(dat, c(3:5|7:9))
slice (dat, c(3:5, 7:9))