Assigning Data to their Percentiles

I am working with the R programming language. Suppose I have the following data frame:

var_1 = rnorm(100,10,10)
var_2 = rnorm(100,10,10)
var_3 = rnorm(100,10,10)

d = data.frame(var_1, var_2, var_3)

head(d)


      var_1     var_2      var_3
1 14.251923 14.877801  22.636207
2  7.325137  8.513718  21.021522
3  3.400001 -3.400397  11.274797
4 16.400597  8.623980   9.366115
5  7.065583 13.155570  17.891432
6 21.297912  4.341385 -11.337330

My Question: For each element in each variable, I want to replace the element with the percentile (e.g. 5th, 10th, 15th, etc.) it belongs to.

For example:

a = quantile(d$var_1, c(0.05, 0.10, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1))
b = quantile(d$var_2, c(0.05, 0.10, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1))
c = quantile(d$var_3, c(0.05, 0.10, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1))

new = data.frame(a,b,c)

              a           b          c
5%   -0.8806901 -7.40560488 -4.7353920
10%   0.3595086 -3.77910527 -0.6874766
15%   1.1201300 -2.91946322  0.9584040
20%   3.0581928  0.05127097  2.1457693
25%   5.0901641  1.91719913  4.6997966
30%   7.0056228  2.56215345  6.2691894
35%   7.6089831  3.58688942  7.1900823
40%   8.9853805  5.00957881  7.8488446
45%   9.9264540  5.73653135  8.6135093
50%  10.2235212  7.43425669  9.6063344
55%  11.5707533  8.54160196 10.9239040
60%  13.2422940  9.65006232 11.7036647
65%  15.1076889 11.07081528 13.2440004
70%  16.5354881 12.38804922 15.2585324
75%  17.9336020 13.16121940 17.6656208
80%  19.5312682 15.31472178 18.4820207
85%  21.9264905 17.99689941 19.3347983
90%  24.4511364 20.47478783 22.0647173
95%  26.6820271 25.27082341 24.4473033
100% 41.4419744 39.75848302 34.5105183

Now, each time a variable is between each percentile range, I would like to make the following replacement:

  • if d$var_1 < -0.8806901, then d$var_1 == as.factor("5th percentile")
  • if d$var_1 > -0.8806901 d$var_1 < 0.3595086, then d$var_1 == as.factor("10th percentile")

...

  • if d$var_1 > 15.1076889 d$var_1 < 16.5354881 , then d$var_1 == as.factor("65th percentile")

etc

  • if d$var_2 < -7.40560488, then d$var_2 == as.factor("5th percentile")

etc

  • if d$var_3 < -4.7353920, then d$var_3 == as.factor("5th percentile")

etc

Can someone please show me how to do this?

Thanks!

Here is one method.

var_1 = rnorm(100,10,10)
var_2 = rnorm(100,10,10)
var_3 = rnorm(100,10,10)

d = data.frame(var_1, var_2, var_3)

a = quantile(d$var_1, seq(0,1,0.05))
b = quantile(d$var_2, seq(0,1,0.05))
c = quantile(d$var_3, seq(0,1,0.05))

new = data.frame(a,b,c)

MyFunc <- function(Vals,Levels){
  cut(x = Vals,breaks=Levels,include.lowest=TRUE,
      labels=paste(seq(5,100,5),"percent"))
}
library(purrr)
Percents <- map2_df(d,new,MyFunc)
head(Percents)
#> # A tibble: 6 x 3
#>   var_1      var_2      var_3      
#>   <fct>      <fct>      <fct>      
#> 1 30 percent 50 percent 30 percent 
#> 2 30 percent 35 percent 60 percent 
#> 3 75 percent 80 percent 100 percent
#> 4 40 percent 40 percent 90 percent 
#> 5 55 percent 65 percent 15 percent 
#> 6 95 percent 55 percent 5 percent

Created on 2021-12-28 by the reprex package (v2.0.1)

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.