I have a table with 3 numerical variables and 1 logical variable with 2 values.
I need to get the minimum, maximum, average, median, and 1 and 4 quartiles of all the numerical variables for both values of the logical variable.
How can I do that with a single code instead of writing a code for each variable?
Here is the df
# A tibble: 10 × 4
a b c d
<int> <int> <lgl> <int>
1 1 1 TRUE 1
2 2 2 FALSE 2
3 3 3 TRUE 3
4 4 4 FALSE 4
5 5 5 TRUE 5
6 6 6 FALSE 6
7 7 7 TRUE 7
8 8 8 FALSE 8
9 9 9 TRUE 9
10 10 10 FALSE 10
structure(list(a = 1:10, b = 1:10, c = c(TRUE, FALSE, TRUE, FALSE,
TRUE, FALSE, TRUE, FALSE, TRUE, FALSE), d = 1:10), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -10L))
1 Like
Hi, @juandmaz , i found an possible solution,is it what you want? I mainly refer to the following package link.
doBy: Groupwise Statistics, LSmeans, Linear Estimates, Utilities (r-project.org)
install.packages("doBy")
library(doBy)
#> Warning: package 'doBy' was built under R version 4.2.3
data188<-structure(list(a = 1:10, b = 1:10, c = c(TRUE, FALSE, TRUE, FALSE,
TRUE, FALSE, TRUE, FALSE, TRUE, FALSE), d = 1:10), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -10L))
data189<-summaryBy(a+b+d ~ c, data188, FUN=summary)
data189
#> # A tibble: 2 × 19
#> c a.Min. `a.1st Qu.` a.Median a.Mean a.3rd…¹ a.Max. b.Min. b.1st…² b.Med…³
#> <lgl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 FALSE 2 4 6 6 8 10 2 4 6
#> 2 TRUE 1 3 5 5 7 9 1 3 5
#> # … with 9 more variables: b.Mean <dbl>, `b.3rd Qu.` <dbl>, b.Max. <dbl>,
#> # d.Min. <dbl>, `d.1st Qu.` <dbl>, d.Median <dbl>, d.Mean <dbl>,
#> # `d.3rd Qu.` <dbl>, d.Max. <dbl>, and abbreviated variable names
#> # ¹`a.3rd Qu.`, ²`b.1st Qu.`, ³b.Median
Created on 2023-07-11 with reprex v2.0.2
library(tidyverse)
some_data <- structure(list(a = 1:10, b = 1:10, c = c(
TRUE, FALSE, TRUE, FALSE,
TRUE, FALSE, TRUE, FALSE, TRUE, FALSE
), d = 1:10), class = c(
"tbl_df",
"tbl", "data.frame"
), row.names = c(NA, -10L))
(wide_version <- some_data |>
group_by(c) |>
summarise(
across(where(is.numeric),
.fns = list(
min = min,
max = max,
mean = mean,
median = median,
q_low = ~ quantile(.x, probs = .25),
q_high = ~ quantile(.x, probs = .75)
)
)))
(long_version <- pivot_longer(wide_version,
cols = -c) |>
pivot_wider(names_from = c))
3 Likes
@juandmaz You are welcome! If i solved your peoblem, would you mind gave me a or click the solution buttion in my post? Thanks a lot.
Comede_way:
Hi, @juandmaz , i found an possible solution,is it what you want? I mainly refer to the following package link.
doBy: Groupwise Statistics, LSmeans, Linear Estimates, Utilities (r-project.org)
install.packages("doBy")
library(doBy)
#> Warning: package 'doBy' was built under R version 4.2.3
data188<-structure(list(a = 1:10, b = 1:10, c = c(TRUE, FALSE, TRUE, FALSE,
TRUE, FALSE, TRUE, FALSE, TRUE, FALSE), d = 1:10), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -10L))
data189<-summaryBy(a+b+d ~ c, data188, FUN=summary)
data189
#> # A tibble: 2 × 19
#> c a.Min. `a.1st Qu.` a.Median a.Mean a.3rd…¹ a.Max. b.Min. b.1st…² b.Med…³
#> <lgl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 FALSE 2 4 6 6 8 10 2 4 6
#> 2 TRUE 1 3 5 5 7 9 1 3 5
#> # … with 9 more variables: b.Mean <dbl>, `b.3rd Qu.` <dbl>, b.Max. <dbl>,
#> # d.Min. <dbl>, `d.1st Qu.` <dbl>, d.Median <dbl>, d.Mean <dbl>,
#> # `d.3rd Qu.` <dbl>, d.Max. <dbl>, and abbreviated variable names
#> # ¹`a.3rd Qu.`, ²`b.1st Qu.`, ³b.Median
Created on 2023-07-11 with reprex v2.0.2
1 Like
system
Closed
July 19, 2023, 5:37am
6
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.