Below is a function that returns quantiles for all numeric columns in a data frame. This function adds in the quantile names separately at the end of the pipe. At the end of this post is another version of the function that extracts the quantile names within the pipe directly from the named output of the quantile
function. Although this second approach seems more tidyverse-ish, it also seems more complicated to reason about. I'd be interested in a more transparent way to extract the quantile names if anyone can suggest a better approach.
library(tidyverse)
quantile_summary = function(data, group, probs=seq(0,1,0.25), ...) {
group=enquo(group)
q.names = paste0(probs*100, "%")
data %>%
group_by(!!group) %>%
summarise_if(is.numeric, funs(list(quantile(., probs=probs, ...)))) %>%
unnest() %>%
# Add column with quantile names. Make quantile names a factor so that
# they'll be ordered correctly when sorted
group_by(!!group) %>%
mutate(quantiles = factor(q.names, levels=q.names)) %>%
# Reorder columns to put quantile names second
select(!!group, quantiles, everything())
}
iris %>%
quantile_summary(Species)
quantile_summary of iris data frame
Species quantiles Sepal.Length Sepal.Width Petal.Length Petal.Width
<fct> <fct> <dbl> <dbl> <dbl> <dbl>
1 setosa 0% 4.3 2.3 1 0.1
2 setosa 25% 4.8 3.2 1.4 0.2
3 setosa 50% 5 3.4 1.5 0.2
4 setosa 75% 5.2 3.68 1.58 0.3
5 setosa 100% 5.8 4.4 1.9 0.6
6 versicolor 0% 4.9 2 3 1
7 versicolor 25% 5.6 2.52 4 1.2
8 versicolor 50% 5.9 2.8 4.35 1.3
9 versicolor 75% 6.3 3 4.6 1.5
10 versicolor 100% 7 3.4 5.1 1.8
11 virginica 0% 4.9 2.2 4.5 1.4
12 virginica 25% 6.22 2.8 5.1 1.8
13 virginica 50% 6.5 3 5.55 2
14 virginica 75% 6.9 3.18 5.88 2.3
15 virginica 100% 7.9 3.8 6.9 2.5
mtcars %>%
quantile_summary(cyl, probs=c(0.25, 0.5, 0.75))
quantile_summary of mtcars data frame
cyl quantiles mpg disp hp drat wt qsec vs am gear carb
<dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 4 25% 22.8 78.8 65.5 3.81 1.88 18.6 1 0.5 4 1
2 4 50% 26 108 91 4.08 2.2 18.9 1 1 4 2
3 4 75% 30.4 121. 96 4.16 2.62 20.0 1 1 4 2
4 6 25% 18.6 160 110 3.35 2.82 16.7 0 0 3.5 2.5
5 6 50% 19.7 168. 110 3.9 3.22 18.3 1 0 4 4
6 6 75% 21 196. 123 3.91 3.44 19.2 1 1 4 4
7 8 25% 14.4 302. 176. 3.07 3.53 16.1 0 0 3 2.25
8 8 50% 15.2 350. 192. 3.12 3.76 17.2 0 0 3 3.5
9 8 75% 16.2 390 241. 3.22 4.01 17.6 0 0 3 4
starwars %>%
quantile_summary(species, na.rm=TRUE, probs=0.5)
quantile_summary of starwars data frame
species quantiles height mass birth_year
<chr> <fct> <dbl> <dbl> <dbl>
1 NA 50% 180. 48 62
2 Aleena 50% 79 15 NA
3 Besalisk 50% 198 102 NA
4 Cerean 50% 198 82 92
5 Chagrian 50% 196 NA NA
6 Clawdite 50% 168 55 NA
7 Droid 50% 132 53.5 33
8 Dug 50% 112 40 NA
9 Ewok 50% 88 20 8
10 Geonosian 50% 183 80 NA
# … with 28 more rows
Here is another approach that extracts the quantile names within the pipe directly from the output of quantile
. It works, but the code seems like it would probably be hard to follow, given the use of map
and multiple "pronouns" within mutate
. Perhaps there's a simpler approach.
quantile_summary2 = function(data, group, probs=seq(0,1,0.25), ...) {
group=enquo(group)
data %>%
group_by(!!group) %>%
summarise_if(is.numeric, funs(list(quantile(., probs=probs, ...)))) %>%
# Add quantile names
# Get quantile names from the last column since this will always be a
# nested list of summary values (assuming there is at least one numeric
# column in the input data frame)
mutate(quantiles = .[[ncol(.)]] %>%
map(~factor(names(.x), levels=names(.x)))) %>%
unnest() %>%
# Reorder columns to put quantile names second
select(!!group, quantiles, everything())
}