# How to calculate percentiles for many subsets and put in a vector

I have a question about calculating percentiles for each year in several years. How to do this? A small sample is below, but I have more than 10 years of daily data.

DF = data.frame(
year=c(1980, 1980, 1980, 1980, 1981, 1981, 1981, 1981),
month =c(12, 12, 12, 12, 1, 1, 1, 1),
day= c(28, 29, 30, 31, 1, 2, 3, 4),
value=c(0.60, 0.21, 0.43, 0.44, 0.24, 0.29, 0.21, 0.29))

I want to calculate the 20th percentile of the values in each year. How to do this? I just know the sum, mean, max, and min in aggregate() function. For example, the sum of values in each year is calculated as

value.yr = aggregate(value~year, sum, data=DF)

The 20th percentile in year 1980 is:

value.1980= quantile(subset(DF, year==1980)\$value, 0.2)

But if I want to calculate the 20th percentile of the values in each year and put them in a vector, how to adapt the code to something like this?
value.20th = aggregate(value~year, FUN = quantile(0.2), data=DF)
Thanks for any help.

Is this what you want to do?

``````DF = data.frame(
year=c(1980, 1980, 1980, 1980, 1981, 1981, 1981, 1981),
month =c(12, 12, 12, 12, 1, 1, 1, 1),
day= c(28, 29, 30, 31, 1, 2, 3, 4),
value=c(0.60, 0.21, 0.43, 0.44, 0.24, 0.29, 0.21, 0.29))
library(dplyr)

DF %>%
group_by(year) %>%
summarise(value.20th = quantile(value, 0.2))
#> # A tibble: 2 x 2
#>    year value.20th
#>   <dbl> <dbl>
#> 1  1980 0.342
#> 2  1981 0.228
``````

Created on 2019-06-11 by the reprex package (v0.3.0)

You see a lot of references here to `tidy` and I hope this solution will encourage you to dig into it.

``````library(tidyverse)
DF = data.frame(
year=c(1980, 1980, 1980, 1980, 1981, 1981, 1981, 1981),
month =c(12, 12, 12, 12, 1, 1, 1, 1),
day= c(28, 29, 30, 31, 1, 2, 3, 4),
value=c(0.60, 0.21, 0.43, 0.44, 0.24, 0.29, 0.21, 0.29))
> DF %>% group_by(year) %>% summarize(year_20Q = quantile(value, 0.2))
# A tibble: 2 x 2
year year_20Q
<dbl>    <dbl>
1  1980    0.342
2  1981    0.228
>
``````

Didn't mean to step on your answer to the quantile question. You just beat me to the draw!

Richard

Thanks for all your help. It works now. By the way, what is the difference between packages dplyr and tidyverse?

`dplyr` is part of the `tidyverse` the latest is like an aggregator for a family of packages.

If your question's been answered (even if by you), would you mind choosing a solution? (See FAQ below for how).

Having questions checked as resolved makes it a bit easier to navigate the site visually and see which threads still need help.

Thanks

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.