mutate based on categories available in the data

saddam · August 22, 2024, 1:38pm

Hi everyone,

I need your help on this dataset (sample):

region	sector	index
A	health	 0.94 
B	health	 1.04 
C	health	 1.12 
D	health	 1.01 
E	health	 1.01 
F	health	 1.06 
G	health	 1.00 
H	health	 1.21 
I	health	 1.08 
J	health	 1.01 
A	edu	 1.20 
B	edu	 1.12 
C	edu	 1.11 
D	edu	 0.85 
E	edu	 1.06 
F	edu	 1.19 
G	edu	 1.10 
H	edu	 1.06 
I	edu	 0.96 
J	edu	 1.08 
A	trade	 1.09 
B	trade	 0.99 
C	trade	 1.11 
D	trade	 0.93 
E	trade	 1.00 
F	trade	 0.98 
G	trade	 1.05 
H	trade	 1.03 
I	trade	 0.91 
J	trade	 1.05

I would like to add new column (kuartile) in my dataset, the column values is the quartile of the other column (index) based on categories available in sector column using codes as follow:


x <- c(a  =  tes %>% filter(sector=="health"),
        b  =  tes %>% filter(sector=="edu"),
        c  =  tes %>% filter(sector=="trade"))
q = c(0.25, 0.5, 0.75)
tes2 <- for (i in seq_along(x)) 
         {tes %>% mutate(kuartile=(case_when (indeks<=quantile(indeks, probs=q[1]) ~ "q1",
                                             indeks > quantile(indeks, probs=q[1]) & indeks <= quantile(indeks, probs=q[2]) ~ "q2",
                                             indeks > quantile(indeks, probs=q[2]) & indeks <= quantile(indeks, probs=q[3]) ~ "q3",
                                             TRUE ~ "q4")
                                                             ))}

tes2

However the result is NULL.

jrkrideau · August 22, 2024, 2:52pm

I assume tes is your original data set. Why are you creating a list {x}

str(x)
List of 9
 $ a.region: chr [1:10] "A" "B" "C" "D" ...
 $ a.sector: chr [1:10] "health" "health" "health" "health" ...
 $ a.index : num [1:10] 0.94 1.04 1.12 1.01 1.01 1.06 1 1.21 1.08 1.01
 $ b.region: chr [1:10] "A" "B" "C" "D" ...
 $ b.sector: chr [1:10] "edu" "edu" "edu" "edu" ...
 $ b.index : num [1:10] 1.2 1.12 1.11 0.85 1.06 1.19 1.1 1.06 0.96 1.08
 $ c.region: chr [1:10] "A" "B" "C" "D" ...
 $ c.sector: chr [1:10] "trade" "trade" "trade" "trade" ...
 $ c.index : num [1:10] 1.09 0.99 1.11 0.93 1 0.98 1.05 1.03 0.91 1.05

I just do not understand what you are doing.

Edit

Your data set has the column index but you are writing indeks elsewhere.

mduvekot · August 22, 2024, 7:22pm

Perhaps what you want is something like this?

library(tidyverse)

df <- tribble(
  ~region, ~sector,	~index,
  "A", "health",  0.94,
  "B", "health",  1.04,
  "C", "health",  1.12, 
  "D", "health",  1.01, 
  "E", "health",  1.01, 
  "F", "health",  1.06, 
  "G", "health",  1.00, 
  "H", "health",  1.21, 
  "I", "health",  1.08, 
  "J", "health",  1.01, 
  "A", "edu",  1.20, 
  "B", "edu",  1.12, 
  "C", "edu",  1.11, 
  "D", "edu",  0.85, 
  "E", "edu",  1.06, 
  "F", "edu",  1.19, 
  "G", "edu",  1.10, 
  "H", "edu",  1.06, 
  "I", "edu",  0.96, 
  "J", "edu",  1.08, 
  "A", "trade",  1.09, 
  "B", "trade",  0.99, 
  "C", "trade",  1.11, 
  "D", "trade",  0.93, 
  "E", "trade",  1.00, 
  "F", "trade",  0.98, 
  "G", "trade",  1.05, 
  "H", "trade",  1.03, 
  "I", "trade",  0.91, 
  "J", "trade",  1.05,
)

tes <- mutate(df, 
       .by = sector,
       quantile = cut(
           index, 
           breaks = quantile(
             index,  
             probs = seq(0, 1, 0.25)
             ),
           include.lowest = TRUE,
           )
       )

saddam · August 22, 2024, 11:03pm

Thanks alot bro, I really appreciate it..

system · September 1, 2024, 2:07pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.